We are now the AI Security Institute

Please enable javascript for this website.

A

A

Safeguards

Breaking agent backbones: Evaluating the security of backbone LLMs in AI agents

Safeguards

•

Oct 26, 2025

Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples

Safeguards

•

Oct 8, 2025

Deep ignorance: Filtering pretraining data builds tamper-resistant safeguards into open-weight LLMs

Safeguards

•

Aug 8, 2025

Security challenges in AI agent deployment: Insights from a large scale public competition

Safeguards

•

Jul 27, 2025

STACK: Adversarial attacks on LLM safeguard pipelines

Safeguards

•

Jul 18, 2025

Existing Large Language Model unlearning evaluations are inconclusive

Safeguards

•

May 31, 2025

Adversarial machine learning: A taxonomy and terminology of attacks and mitigations

Safeguards

•

Mar 24, 2025

Model tampering attacks enable more rigorous evaluations of LLM capabilities

Safeguards

•

Mar 2, 2025

Fundamental limitations in defending LLM finetuning APIs

Safeguards

•

Feb 20, 2025

Principles for evaluating misuse safeguards of frontier AI systems

Safeguards

•

Feb 4, 2025

AgentHarm: A benchmark for measuring harmfulness of LLM agents

Safeguards

•

Oct 11, 2024

Examining backdoor data poisoning at scale

Safeguards

•

October 9, 2025

Our work with Anthropic and the Alan Turing Institute suggests that data poisoning attacks may be easier than previously believed.

Read More Read More

How we’re working with frontier AI developers to improve model security

Safeguards

•

September 13, 2025

Insights into our ongoing voluntary collaborations with Anthropic and OpenAI.

Read More Read More

From bugs to bypasses: adapting vulnerability disclosure for AI safeguards

Safeguards

•

September 2, 2025

Exploring how far cyber security approaches can help mitigate risks in generative AI systems, in collaboration with the National Cyber Security Centre (NCSC).

Read More Read More

Managing risks from increasingly capable open-weight AI systems

Safeguards

•

August 29, 2025

Current methods and open problems in open-weight model risk management.

Read More Read More

Making safeguard evaluations actionable

Safeguards

•

May 29, 2025

An Example Safety Case for Safeguards Against Misuse

Read More Read More

Principles for safeguard evaluation

Safeguards

•

February 4, 2025

Our new paper proposes core principles for evaluating misuse safeguards

Read More Read More