We are now the AI Security Institute
Please enable javascript for this website.
AISI brand artwork

Investigating models for misalignment

Red Team

November 26, 2025

Insights from our alignment evaluations of Claude Opus 4.1, Sonnet 4.5, and a pre‑release snapshot of Opus 4.5.

Examining backdoor data poisoning at scale

Red Team

October 9, 2025

Our work with Anthropic and the Alan Turing Institute suggests that data poisoning attacks may be easier than previously believed.

How we’re working with frontier AI developers to improve model security

Red Team

September 13, 2025

Insights into our ongoing voluntary collaborations with Anthropic and OpenAI.

From bugs to bypasses: adapting vulnerability disclosure for AI safeguards

Red Team

September 2, 2025

Exploring how far cyber security approaches can help mitigate risks in generative AI systems, in collaboration with the National Cyber Security Centre (NCSC).

Managing risks from increasingly capable open-weight AI systems

Red Team

August 29, 2025

Current methods and open problems in open-weight model risk management.

Making safeguard evaluations actionable

Red Team

May 29, 2025

An Example Safety Case for Safeguards Against Misuse

Principles for safeguard evaluation

Red Team

February 4, 2025

Our new paper proposes core principles for evaluating misuse safeguards