Loss of Oversight: How AI systems may become harder to audit, monitor, and investigate

Authors

Jordan Taylor, Max Heitmann, Ed Fage, Thomas Read, Joseph Bloom

Blog post

Will it become harder to oversee AI systems?

Abstract

The safety of advanced AI systems increasingly depends on the ability to oversee them: to audit models for concerning behaviours before deployment, monitor their activity during operation, and investigate incidents after they occur. This report maps the landscape of AI oversight and assesses how it is likely to change. Drawing on 25 expert interviews across frontier AI developers, government, NGOs, and academia, together with a literature review and internal analysis, we examine five sources of oversight signal: model behaviour, chain-of-thought reasoning, internals activations and circuits, memory architectures, and honesty training. For each source, we identify the properties that current oversight relies on, the pathways by which these properties could degrade, and the technical levers available to preserve them. Our central finding is that literature and expert opinion support the conclusion that current oversight rests on foundations that are likely to erode, absent effective intervention. We recommend that developers track and report shifts in oversight-relevant properties, preserve oversight affordances by design, and invest in emerging oversight techniques as fallbacks against continued degradation of current methods.

Loss of Oversight: How AI systems may become harder to audit, monitor, and investigate

Authors

Blog post

Abstract

Notes