We are now the AI Security Institute
Please enable javascript for this website.

RepliBench: Evaluating the Autonomous Replication Capabilities of Language Model Agents

Read the Full Paper

Authors

Sid Black
Asa Cooper Stickland
Jake Pencharz
Oliver Sourbut
Michael Schmatz
Jay Bailey
Ollie Matthews
Ben Millwood
Alex Remedios
Alan Cooney

Sid Black, Asa Cooper Stickland, Jake Pencharz, Oliver Sourbut, Michael Schmatz, Jay Bailey, Ollie Matthews, Ben Millwood, Alex Remedios & Alan Cooney

Abstract

Uncontrolled self-replication of AI agents could be a major safety risk. To study this, the researchers created RepliBench, a set of 201 tasks across four areas: resource acquisition, weight exfiltration, compute replication, and persistence. They tested five advanced models and found that current models aren’t capable of full self-replication yet, but they are getting better at parts of it.

Notes