Our work with Anthropic and the Alan Turing Institute suggests that data poisoning attacks may be easier than previously believed.
Insights into our ongoing voluntary collaborations with Anthropic and OpenAI.
Exploring how far cyber security approaches can help mitigate risks in generative AI systems, in collaboration with the National Cyber Security Centre (NCSC).
Current methods and open problems in open-weight model risk management.
An Example Safety Case for Safeguards Against Misuse
Our new paper proposes core principles for evaluating misuse safeguards