A comprehensive toolkit for safely evaluating AI agents.
A comprehensive benchmark to detect emerging replication abilities in AI systems and provide a quantifiable understanding of potential risks
AISI funded Epoch AI to explore AI researchers’ differing predictions on the automation of AI research and development and their suggestions for how to evaluate relevant capabilities.