AISI develops and conducts model evaluations to assess risks from cyber, chemical, biological misuse; autonomous capabilities and the effectiveness of safeguards. We also are also working to advance foundational safety and societal resilience research.
We studied whether people want AI to be more human-like.
A common technique for quickly assessing AI capabilities is prompting models to answer hundreds of questions, then automatically scoring the answers. We share insights from months of using this method.
AISI is bringing together AI companies and researchers for an invite-only conference to accelerate the design and implementation of frontier AI safety frameworks. This post shares the call for submissions that we sent to conference attendees.
AISI funded Epoch AI to explore AI researchers’ differing predictions on the automation of AI research and development and their suggestions for how to evaluate relevant capabilities.
As a complement to our empirical evaluations of frontier AI models, AISI is planning a series of collaborations and research projects sketching safety cases for more advanced models than exist today, focusing on risks from loss of control and autonomy. By a safety case, we mean a structured argument that an AI system is safe within a particular training or deployment context.
We tested leading AI models for cyber, chemical, biological, and agent capabilities and safeguards effectiveness. Our first technical blog post shares a snapshot of our methods and results.
This is an up-to-date, evidence-based report on the science of advanced AI safety. It highlights findings about AI progress, risks, and areas of disagreement in the field. The report is chaired by Yoshua Bengio and coordinated by AISI.
We open-sourced our framework for large language model evaluation, which provides facilities for prompt engineering, tool usage, multi-turn dialogue, and model-graded evaluations.
This post offers an overview of why we are doing this work, what we are testing for, how we select models, our recent demonstrations and some plans for our future work.