We are now the AI Security Institute
Please enable javascript for this website.

An Example Safety Case for Safeguards Against Misuse

Read the Full Paper

Authors

No items found.

Joshua Clymer, Jonah Weinbaum, Robert Kirk, Kimberly Mai, Selena Zhang, Xander Davies

Abstract

Existing evaluations of AI misuse safeguards provide a patchwork of evidence that is often difficult to connect to real-world decisions. To bridge this gap, we describe an end-to-end argument (a “safety case”) that misuse safeguards reduce the risk posed by an AI assistant to low levels. We first describe how a hypothetical developer red teams safeguards, estimating the effort required to evade them.Then, the developer plugs this estimate into a quantitative “uplift model” to determine how much barriers introduced by safeguards dissuade misuse (https://www.aimisusemodel.com/). This procedure provides a continuous signal of risk during deployment that helps the developer rapidly respond to emerging threats.Finally, we describe how to tie these components together into a simple safety case.Our work provides one concrete path – though not the only path – to rigorously justifying AI misuse risks are low.

Notes