We tested seven large language models (LLMs) on two custom-built cyber ranges, measuring their ability to execute extended attack sequences in complex environments.
A comprehensive benchmark to detect emerging replication abilities in AI systems and provide a quantifiable understanding of potential risks
AISI funded Epoch AI to explore AI researchers’ differing predictions on the automation of AI research and development and their suggestions for how to evaluate relevant capabilities.