Cyber & Autonomous Systems | AISI Work Category

Research

Measuring AI Agents’ Progress on Multi-Step Cyber Attack Scenarios

Cyber & Autonomous Systems

•

Mar 16, 2026

RepliBench: Evaluating the autonomous replication capabilities of language model agents

Cyber & Autonomous Systems

•

May 21, 2025

How fast is autonomous AI cyber capability advancing?

Cyber & Autonomous Systems

•

May 13, 2026

The length of tasks frontier models can autonomously complete in our narrow cyber suite has been doubling every few months. This doubling rate has become faster over time, and recent models exceeded our previous trends.

Read More Read More

Our evaluation of OpenAI's GPT-5.5 cyber capabilities

Cyber & Autonomous Systems

•

April 30, 2026

AISI conducted cyber evaluations on OpenAI's GPT-5.5. GPT-5.5 is one of the strongest models we have tested on our cyber tasks and is the second model to solve one of our multi-step cyber-attack simulations end-to-end.

Read More Read More

Our evaluation of Claude Mythos Preview’s cyber capabilities

Cyber & Autonomous Systems

•

April 13, 2026

We conducted cyber evaluations of Anthropic’s Claude Mythos Preview and found continued improvement in capture-the-flag (CTF) challenges and significant improvement on multi-step cyber-attack simulations.

Read More Read More

Harnessing frontier AI for cyber defence

Cyber & Autonomous Systems

•

March 31, 2026

Sharing work with the National Cyber Security Centre (NCSC) on how cyber defenders can use advanced AI capabilities to stay ahead of attackers.

Read More Read More

How do frontier AI agents perform in multi-step cyber-attack scenarios?

Cyber & Autonomous Systems

•

March 16, 2026

We tested seven large language models (LLMs) on two custom-built cyber ranges, measuring their ability to execute extended attack sequences in complex environments.

Read More Read More

Inspect Cyber: A New Standard for Agentic Cyber Evaluations

Cyber & Autonomous Systems

•

June 26, 2025

Read More Read More

RepliBench: measuring autonomous replication capabilities in AI systems

Cyber & Autonomous Systems

•

April 22, 2025

A comprehensive benchmark to detect emerging replication abilities in AI systems and provide a quantifiable understanding of potential risks

Read More Read More

Cross-post: "Interviewing AI researchers on automation of AI R&D" by Epoch AI

Cyber & Autonomous Systems

•

August 27, 2024

AISI funded Epoch AI to explore AI researchers’ differing predictions on the automation of AI research and development and their suggestions for how to evaluate relevant capabilities.

Read More Read More

Cyber and Autonomous Systems

Research

Measuring AI Agents’ Progress on Multi-Step Cyber Attack Scenarios

RepliBench: Evaluating the autonomous replication capabilities of language model agents

Blog

How fast is autonomous AI cyber capability advancing?

Our evaluation of OpenAI's GPT-5.5 cyber capabilities

Our evaluation of Claude Mythos Preview’s cyber capabilities

Harnessing frontier AI for cyber defence

How do frontier AI agents perform in multi-step cyber-attack scenarios?

Inspect Cyber: A New Standard for Agentic Cyber Evaluations

RepliBench: measuring autonomous replication capabilities in AI systems

Cross-post: "Interviewing AI researchers on automation of AI R&D" by Epoch AI