Model Transparency
•
December 9, 2025
Our new paper shares the results of an auditing game to evaluate ten methods for sandbagging detection in AI models.