Read the Frontier AI Trends Report
Please enable javascript for this website.
AISI brand artwork

Blog

Releasing AISI’s Engineering Playbook

Engineering

June 18, 2026

Building on the momentum of the Inspect toolkit, we’re open-sourcing parts of the research stack behind AISI's evaluations.

What can sandboxed AI agents learn about their evaluation environments?

Engineering

April 20, 2026

We deployed open-source AI agent OpenClaw inside a sandbox on our research platform. Despite our initial countermeasures, it successfully identified our organisation by name, inferred the identity of a human operator and reconstructed a timeline of some of our research activities.

Can AI agents escape their sandboxes? A benchmark for safely measuring container breakout capabilities

Engineering

March 23, 2026

We introduce SandboxEscapeBench, the first benchmark to systematically evaluate whether AI agents can break out of their sandboxes, and share some early results.

The Inspect Sandboxing Toolkit: Scalable and secure AI agent evaluations

Engineering

August 7, 2025

A comprehensive toolkit for safely evaluating AI agents.