Arena Blog
  • Visit Arena
  • About
  • Leaderboard Changelog
Peter Gostev

Peter Gostev

Inside BullshitBench: AI Models and Nonsense Detection

Inside BullshitBench: AI Models and Nonsense Detection

AI failures like hallucinations are well documented. A less examined problem is that models will accept nonsensical premises without question and produce confident, detailed answers to questions that have no valid answer. BullshitBench measures whether models challenge broken premises or play along. We tested over 80 models from all major
Peter Gostev 18 Mar 2026
  • arena.ai
  • Terms of Use
  • Privacy Policy
  • Cookie Policy
Ⓒ Arena Intelligence 2026