We are now the AI Security Institute
Please enable javascript for this website.

A structured protocol for elicitation experiments

Calibrating AI risk assessment through rigorous elicitation practices.

Identifying dangerous AI capabilities requires rigorous evaluation of models at the upper limit of their abilities—a process that is far from straightforward. There are many ways to enhance the performance of a model after it has been trained, from carefully crafted prompts to external tool access. We need evaluation practices that uncover the full range of what models can be used to achieve.

These practices fall under the umbrella of elicitation, which plays a critical role in safety and security work. Done well, it helps us avoid underestimating what models can do, thereby improving the accuracy of and confidence in claims related to risk thresholds. This refined assessment directly informs how we test models at AISI and supports the UK government’s understanding of AI-related risks.  

But elicitation is hard to get right. It’s part science, part craft—and current methods are often inconsistent and poorly transferable across models and tasks. As a result, it is difficult to accumulate insights or apply lessons across different contexts.

Over the past two months the AISI Science of Evaluations Team has standardised elicitation experiment practices across our workstreams to ensure experiments are more reproducible, comparable, and analysable. This blog explains why capability elicitation is so important and summarises our approach.  

You can also read our full elicitation best practices checklist.

What is capability elicitation?

Capability elicitation experiments are designed to unlock or enhance a model’s latent ablities after it has been trained, in order to best understand its capability profile. Techniques include:

  • Using prompts that encourage strategic thinking or task decomposition  
  • Giving the model access to external tools like a command line, a Python interpreter, or the internet  
  • Generating multiple candidate responses and then choosing the best one
  • Embedding the model within an agent scaffold—a structured process that guides iterative response refinement or decision-making
  • Creating multi-agent setups where several instances of the model critique or debate each other to reach better conclusions.

All of these methods encourage the model to generate more accurate responses, as well as follow more effective and efficient sequences of actions, compared to a simple query.

Why is capability elicitation important?

Conducting capability elicitation experiments is crucial for accurately estimating the full potential of AI systems, especially in risk-relevant scenarios. Without these techniques, we may significantly underestimate what models can achieve and what skilled or malicious users might be able to accomplish by unlocking hidden capabilities. Research indicates that elicitation techniques can significantly enhance model performance, with improvements comparable to increasing training compute between five and twenty times.

Capability Elicitation at AISI

Avoiding the underestimation of risk-relevant capabilities is a key priority at AISI. This is why we conduct capability elicitation research across all major risk domains, including cyber, autonomous systems, and criminal misuse.

A Structured Protocol for Elicitation

To support this work, we have developed a structured elicitation protocol—a checklist of best practices intended to guide rigorous and informative experimentation. The protocol is designed to promote:  

#1 Scientific rigour

The checklist provides best practices for experimental design, including establishing an appropriate baseline, identifying a specific analysis type, and conducting a pilot study.  

We also ensure alignment with core machine learning standards, such as proper treatment of train-test distribution shift and the avoidance of data leakage.

#2 Internal knowledge sharing

To ensure shared standards across the team, we report results using consistent formats and shared documentation. Researchers report both positive and negative results to prevent duplicated effort and refine our collective approach.  

#3 Elicitation craftsmanship

Our checklist offers guidance on effective prompt design and agent scaffold construction, to ensure we are eliciting the full capabilities of the models we test. It covers detecting and fixing egregious failures, ensuring appropriate use of external tools, and exhausting prompt techniques that could improve model performance, including chain-of-thought reasoning.  

Certain elements—particularly those related to the more intuitive or creative aspects of elicitation—are treated as evolving norms, shaped by our growing community of practice at AISI. We’ve established a dedicated working group to coordinate ongoing efforts.

Modelling Threat Actors Through Elicitation

Elicitation experiments also help us contextualise a model’s capabilities relative to the skills, knowledge, and resources available to different types of malicious actors. These actors may vary widely in their technical ability to apply elicitation techniques, their level of domain-specific expertise to craft effective prompts and assess outputs, and their access to computational resources required for more complex methods, such as multi-agent frameworks.

Elicitation as a Tool for Forecasting

We also believe that elicitation experiments show promise as a tools for forecasting, serving as a proxy for where model capabilities may head in the future. This is an area of ongoing interest for us, as it may offer a low-cost method for predicting future capability levels in large language models.

Building shared elicitation practices

Looking ahead, we believe that shared capability elicitation practices must be established across the AI safety, security, and broader capabilities communities. This will require common tools, shared norms, and spaces to responsibly exchange best practices and empirical results. Collaboration can help advance our collective understanding of how to bring out the full capabilities—both beneficial and dangerous—of current and future models. We hope that our best practices checklist can help to build this foundation.

If you are working on evaluation, red teaming, risk modelling, or applied alignment, we encourage you to prioritise effective elicitation as a core component of your approach—and contribute to a growing community of practice dedicated to getting it right.