Inside BullshitBench: AI Models and Nonsense Detection
AI failures like hallucinations are well documented. A less examined problem is that models will accept nonsensical premises without question and produce confident, detailed answers to questions that have no valid answer. BullshitBench measures whether models challenge broken premises or play along. We tested over 80 models from all major