Read the Frontier AI Trends Report
Please enable javascript for this website.

Cyber and Autonomous Systems

AISI brand artwork

Blog

Our evaluation of OpenAI's GPT-5.5 cyber capabilities

AISI conducted cyber evaluations on OpenAI's GPT-5.5. GPT-5.5 is one of the strongest models we have tested on our cyber tasks and is the second model to solve one of our multi-step cyber-attack simulations end-to-end.

Our evaluation of Claude Mythos Preview’s cyber capabilities

We conducted cyber evaluations of Anthropic’s Claude Mythos Preview and found continued improvement in capture-the-flag (CTF) challenges and significant improvement on multi-step cyber-attack simulations.

Harnessing frontier AI for cyber defence

Sharing work with the National Cyber Security Centre (NCSC) on how cyber defenders can use advanced AI capabilities to stay ahead of attackers.

How do frontier AI agents perform in multi-step cyber-attack scenarios?

We tested seven large language models (LLMs) on two custom-built cyber ranges, measuring their ability to execute extended attack sequences in complex environments.

RepliBench: measuring autonomous replication capabilities in AI systems

A comprehensive benchmark to detect emerging replication abilities in AI systems and provide a quantifiable understanding of potential risks

Cross-post: "Interviewing AI researchers on automation of AI R&D" by Epoch AI

Cyber & Autonomous Systems

August 27, 2024

AISI funded Epoch AI to explore AI researchers’ differing predictions on the automation of AI research and development and their suggestions for how to evaluate relevant capabilities.