Engineering

Research

Quantifying Frontier LLM Capabilities for Container Sandbox Escape

Engineering

•

Mar 23, 2026

What can sandboxed AI agents learn about their evaluation environments?

Engineering

•

April 20, 2026

We deployed open-source AI agent OpenClaw inside a sandbox on our research platform. Despite our initial countermeasures, it successfully identified our organisation by name, inferred the identity of a human operator and reconstructed a timeline of some of our research activities.

Can AI agents escape their sandboxes? A benchmark for safely measuring container breakout capabilities

Engineering

•

March 23, 2026

We introduce SandboxEscapeBench, the first benchmark to systematically evaluate whether AI agents can break out of their sandboxes, and share some early results.

The Inspect Sandboxing Toolkit: Scalable and secure AI agent evaluations

Engineering

•

August 7, 2025

A comprehensive toolkit for safely evaluating AI agents.

Research

Quantifying Frontier LLM Capabilities for Container Sandbox Escape

Blog

What can sandboxed AI agents learn about their evaluation environments?

Can AI agents escape their sandboxes? A benchmark for safely measuring container breakout capabilities

The Inspect Sandboxing Toolkit: Scalable and secure AI agent evaluations