Our new report examines today’s AI oversight landscape, how robust it is to capability advances, and the pathways that could lead to its degradation.
Our new paper shares the results of an auditing game to evaluate ten methods for sandbagging detection in AI models.