Engineering
•
Mar 23, 2026
March 23, 2026
We introduce SandboxEscapeBench, the first benchmark to systematically evaluate whether AI agents can break out of their sandboxes, and share some early results.
August 7, 2025
A comprehensive toolkit for safely evaluating AI agents.