We are now the AI Security Institute
Please enable javascript for this website.

Principles for Evaluating Misuse Safeguards of Frontier AI Systems

Read the Full Paper

Authors

No items found.

Safeguards Team

Abstract

Misuse safeguards—technical interventions implemented by frontier AI developers to prevent users from eliciting harmful information or actions from AI systems—are an important tool in addressing potential risks from the misuse of these systems. In many parts of machine learning, establishing clear problem statements and evaluations drives and accelerates progress, and we think this same lesson applies to safeguards. To this end, we propose five principles for rigorous evaluations of misuse safeguards, which form a step-by-step plan for safeguards assessment. We additionally release a lightweight template designed to enable developers to draw from our recommendations as they perform safeguards assessment. These documents aim to drive standardisation and rigour in how safeguards evaluations are performed, which we expect will become increasingly important as AI capabilities advance. We encourage frontier AI developers to use our principles and template, and to share their experience and feedback to help us improve safeguards evaluations going forwards

Notes