Engineering
•
March 23, 2026
We introduce SandboxEscapeBench, the first benchmark to systematically evaluate whether AI agents can break out of their sandboxes, and share some early results.
August 7, 2025
A comprehensive toolkit for safely evaluating AI agents.