AISI Blog | The AI Security Institute

Exploring how far cyber security approaches can help mitigate risks in generative AI systems, in collaboration with the National Cyber Security Centre (NCSC).

Read More Read More

Managing risks from increasingly capable open-weight AI systems

Research

•

Aug 29, 2025

Current methods and open problems in open-weight model risk management.

Read More Read More

The Inspect Sandboxing Toolkit: Scalable and secure AI agent evaluations

Research

•

Aug 7, 2025

A comprehensive toolkit for safely evaluating AI agents.

Read More Read More

Announcing the Alignment Project: A global fund of over £15 million for AI alignment research

Organisation

•

Jul 30, 2025

Read More Read More

Navigating the uncharted: Building societal resilience to frontier AI

Research

•

Jul 24, 2025

We outline our approach to study and address AI risks in real-world applications

Read More Read More

International joint testing Exercise: Agentic testing

Research

•

Jul 17, 2025

Advancing methodologies for agentic evaluations across domains, including leakage of sensitive Information, fraud and cybersecurity threats.

Read More Read More

A structured protocol for elicitation experiments

Research

•

Jul 16, 2025

Calibrating AI risk assessment through rigorous elicitation practices.

Read More Read More

Why we're working on white box control

Research

•

Jul 10, 2025

An introduction to white box control, and an update on our research so far.

Read More Read More

LLM judges on trial: A new statistical framework to assess autograders

Research

•

Jul 9, 2025

Our new framework can assess the reliability of LLM evaluators, while simultaneously answering a primary research question.

Read More Read More

How will AI enable the crimes of the future?

Research

•

Jul 3, 2025

How we're working to track and mitigate against criminal misuse of AI.

Read More Read More

Inspect Cyber: A New Standard for Agentic Cyber Evaluations

Research

•

Jun 26, 2025

Read More Read More

New updates to the AISI Challenge Fund

Organisation

•

Jun 5, 2025

Read More Read More

Making Safeguard Evaluations Actionable

Research

•

May 29, 2025

An Example Safety Case for Safeguards Against Misuse

Read More Read More

HiBayES: Improving LLM Evaluation with Hierarchical Bayesian Modelling

Research

•

May 12, 2025

HiBayES: a flexible, robust statistical modelling framework that accounts for the nuances and hierarchical structure of advanced evaluations.

Read More Read More

Research Agenda

Research

•

May 6, 2025

We outline our research priorities, our approach to developing technical solutions to the most pressing AI concerns, and the key risks that must be addressed as AI capabilities advance.

Read More Read More

RepliBench: measuring autonomous replication capabilities in AI systems

Research

•

Apr 22, 2025

A comprehensive benchmark to detect emerging replication abilities in AI systems and provide a quantifiable understanding of potential risks

Read More Read More

How to evaluate control measures for AI agents?

Research

•

Apr 11, 2025

Our new paper outlines how AI control methods can mitigate misalignment risks as capabilities of AI systems increase

Read More Read More

Strengthening AI Resilience

Organisation

•

Apr 3, 2025

20 Systemic Safety Grant Awardees Announced

Read More Read More

How we’re addressing the gap between AI capabilities and mitigations

Research

•

Mar 11, 2025

We outline our approach to technical solutions for misuse and loss of control.

Read More Read More

How can safety cases be used to help with frontier AI safety?

Research

•

Feb 10, 2025

Our new papers show how safety cases can help AI developers turn plans in their safety frameworks into action

Read More Read More

Principles for Safeguard Evaluation

Research

•

Feb 4, 2025

Our new paper proposes core principles for evaluating misuse safeguards

Read More Read More

Pre-Deployment Evaluation of OpenAI’s o1 Model

Research

•

Dec 18, 2024

The UK Artificial Intelligence Safety Institute and the U.S. Artificial Intelligence Safety Institute conducted a joint pre-deployment evaluation of OpenAI's o1 model

Read More Read More

Long-Form Tasks

Research

•

Dec 3, 2024

A Methodology for Evaluating Scientific Assistants

Read More Read More

Pre-Deployment Evaluation of Anthropic’s Upgraded Claude 3.5 Sonnet

Research

•

Nov 19, 2024

The UK Artificial Intelligence Safety Institute and U.S. Artificial Intelligence Safety Institute conducted a joint pre-deployment evaluation of Anthropic’s latest model

Read More Read More

Safety case template for ‘inability’ arguments

Research

•

Nov 14, 2024

How to write part of a safety case showing a system does not have offensive cyber capabilities

Read More Read More

Our First Year

Organisation

•

Nov 13, 2024

The AI Safety Institute reflects on its first year

Read More Read More

Announcing Inspect Evals

Research

•

Nov 13, 2024

We’re open-sourcing dozens of LLM evaluations to advance safety research in the field

Read More Read More

Bounty programme for novel evaluations and agent scaffolding

Research

•

Nov 5, 2024

We are launching a bounty for novel evaluations and agent scaffolds to help assess dangerous capabilities in frontier AI systems.

Read More Read More

Early lessons from evaluating frontier AI systems

Research

•

Oct 24, 2024

We look into the evolving role of third-party evaluators in assessing AI safety, and explore how to design robust, impactful testing frameworks.

Read More Read More

Advancing the field of systemic AI safety: grants open

Organisation

•

Oct 15, 2024

Calling researchers from academia, industry, and civil society to apply for up to £200,000 of funding.

Read More Read More

Why I joined AISI by Geoffrey Irving

Organisation

•

Oct 3, 2024

Our Chief Scientist, Geoffrey Irving, on why he joined the UK AI Safety Institute and why he thinks other technical folk should too

Read More Read More

Should AI systems behave like people?

Research

•

Sep 25, 2024

We studied whether people want AI to be more human-like.

Read More Read More

Early Insights from Developing Question-Answer Evaluations for Frontier AI

Research

•

Sep 23, 2024

A common technique for quickly assessing AI capabilities is prompting models to answer hundreds of questions, then automatically scoring the answers. We share insights from months of using this method.

Read More Read More

Conference on frontier AI safety frameworks

Research

•

Sep 19, 2024

AISI is bringing together AI companies and researchers for an invite-only conference to accelerate the design and implementation of frontier AI safety frameworks. This post shares the call for submissions that we sent to conference attendees.

Read More Read More

Cross-post: "Interviewing AI researchers on automation of AI R&D" by Epoch AI

Research

•

Aug 27, 2024

AISI funded Epoch AI to explore AI researchers’ differing predictions on the automation of AI research and development and their suggestions for how to evaluate relevant capabilities.

Read More Read More

Safety cases at AISI

Research

•

Aug 23, 2024

As a complement to our empirical evaluations of frontier AI models, AISI is planning a series of collaborations and research projects sketching safety cases for more advanced models than exist today, focusing on risks from loss of control and autonomy. By a safety case, we mean a structured argument that an AI system is safe within a particular training or deployment context.

Read More Read More

Announcing our San Francisco office

Organisation

•

May 20, 2024

We are opening an office in San Francisco! This will enable us to hire more top talent, collaborate closely with the US AI Safety Institute and engage even more with the wider AI research community.

Read More Read More

Fourth progress report

Organisation

•

May 20, 2024

Since February, we released our first technical blog post, published the International Scientific Report on the Safety of Advanced AI, open-sourced our testing platform Inspect, announced our San Francisco office, announced a partnership with the Canadian AI Safety Institute, grew our technical team to >30 researchers and appointed Jade Leung as our Chief Technology Officer.

Read More Read More

Advanced AI evaluations at AISI: May update

Research

•

May 20, 2024

We tested leading AI models for cyber, chemical, biological, and agent capabilities and safeguards effectiveness. Our first technical blog post shares a snapshot of our methods and results.

Read More Read More

International Scientific Report on the Safety of Advanced AI: Interim Report

Research

•

May 17, 2024

This is an up-to-date, evidence-based report on the science of advanced AI safety. It highlights findings about AI progress, risks, and areas of disagreement in the field. The report is chaired by Yoshua Bengio and coordinated by AISI.

Read More Read More

Open sourcing our testing framework Inspect

Research

•

Apr 21, 2024

We open-sourced our framework for large language model evaluation, which provides facilities for prompt engineering, tool usage, multi-turn dialogue, and model-graded evaluations.

Read More Read More

Announcing the UK and US AISI partnership

Governance

•

Apr 2, 2024

The UK and US AI Safety Institutes signed a landmark agreement to jointly test advanced AI models, share research insights, share model access and enable expert talent transfers.

Read More Read More

Announcing the UK and France AI Research Institutes’ collaboration

Governance

•

Feb 29, 2024

The UK AI Safety Institute and France’s Inria (The National Institute for Research in Digital Science and Technology) are partnering to advance AI safety research.

Read More Read More

Our approach to evaluations

Research

•

Feb 9, 2024

This post offers an overview of why we are doing this work, what we are testing for, how we select models, our recent demonstrations and some plans for our future work.

Read More Read More

Third progress report

Organisation

•

Feb 5, 2024

Since October, we have recruited leaders from DeepMind and Oxford, onboarded 23 new researchers, published the principles behind the International Scientific Report on Advanced AI Safety, and began pre-deployment testing of advanced AI systems.

Read More Read More

First AI Safety Summit

Governance

•

Nov 2, 2023

At the first AI Safety Summit at Bletchley Park, world leaders and top companies agreed on the significance of advanced AI risks and the importance of testing.

Read More Read More

Second progress report

Organisation

•

Oct 30, 2023

Since September, we have recruited leaders from OpenAI and Humane Intelligence, tripled the capacity of our research team, announced 6 new research partnerships, and helped establish the UK’s fastest supercomputer.

Read More Read More

First Progress Report

Organisation

•

Sep 7, 2023

In our first 11 weeks, we have recruited an advisory board of national security and ML leaders, including Yoshua Bengio, recruited top professors from Cambridge and Oxford and announced 4 research partnerships.

Read More Read More

Improving our understanding of advanced AI

Transcript analysis for AI agent evaluations

Examining backdoor data poisoning at scale

Do chatbots inform or misinform voters?

How we’re working with frontier AI developers to improve model security

From bugs to bypasses: adapting vulnerability disclosure for AI safeguards

Managing risks from increasingly capable open-weight AI systems

The Inspect Sandboxing Toolkit: Scalable and secure AI agent evaluations

Announcing the Alignment Project: A global fund of over £15 million for AI alignment research

Navigating the uncharted: Building societal resilience to frontier AI

International joint testing Exercise: Agentic testing

A structured protocol for elicitation experiments

Why we're working on white box control

LLM judges on trial: A new statistical framework to assess autograders

How will AI enable the crimes of the future?

Inspect Cyber: A New Standard for Agentic Cyber Evaluations

New updates to the AISI Challenge Fund

Making Safeguard Evaluations Actionable

HiBayES: Improving LLM Evaluation with Hierarchical Bayesian Modelling

Research Agenda

RepliBench: measuring autonomous replication capabilities in AI systems

How to evaluate control measures for AI agents?

Strengthening AI Resilience

How we’re addressing the gap between AI capabilities and mitigations

How can safety cases be used to help with frontier AI safety?

Principles for Safeguard Evaluation

Pre-Deployment Evaluation of OpenAI’s o1 Model

Long-Form Tasks

Pre-Deployment Evaluation of Anthropic’s Upgraded Claude 3.5 Sonnet

Safety case template for ‘inability’ arguments

Our First Year

Announcing Inspect Evals

Bounty programme for novel evaluations and agent scaffolding

Early lessons from evaluating frontier AI systems

Advancing the field of systemic AI safety: grants open

Why I joined AISI by Geoffrey Irving

Should AI systems behave like people?

Early Insights from Developing Question-Answer Evaluations for Frontier AI

Conference on frontier AI safety frameworks

Cross-post: "Interviewing AI researchers on automation of AI R&D" by Epoch AI

Safety cases at AISI

Announcing our San Francisco office

Fourth progress report

Advanced AI evaluations at AISI: May update

International Scientific Report on the Safety of Advanced AI: Interim Report

Open sourcing our testing framework Inspect

Announcing the UK and US AISI partnership

Announcing the UK and France AI Research Institutes’ collaboration

Our approach to evaluations

Third progress report

First AI Safety Summit

Second progress report

First Progress Report