Cover Your Tracks (Original)
Cover Your Tracks (Modified 1)
Cover Your Tracks (Modified 2)
Agentic misalignment
Satirical
Satirical LDA (alpha=8)
Baseline:
Green: Removing this sentence results in more verbalized eval awareness
Red: Removing this sentence results in less verbalized eval awareness
System Prompt:
User Prompt: