CodeNames Oversight Results Explorer

This is an interactive explorer for the research described in Evaluating Oversight Robustness with Incentivized Reward Hacking.

How to Navigate

Use the navigation menu to explore results from three training protocols:

Each subfolder follows the naming pattern: [overseer-type]-adv-[incentive-strength]