If you don't like being deepfaked, this honestly won't do much to help. But it will at least fix the lowest-hanging fruit.
AI benchmarks rely on models not knowing they’re being tested. Anthropic revealed that Claude Opus 4.6 figured it out anyway, identifying the BrowseComp benchmark by name and decrypting its encrypted ...