Our dedicated library to make AI control experiments easy, consistent, and repeatable.
An introduction to white box control, and an update on our research so far.
Our new paper outlines how AI control methods can mitigate misalignment risks as capabilities of AI systems increase