Arena Blog

Latest

Inside BullshitBench: AI Models and Nonsense Detection

Inside BullshitBench: AI Models and Nonsense Detection

AI failures like hallucinations are well documented. A less examined problem is that models will accept nonsensical premises without question and produce confident, detailed answers to questions that have no valid answer. BullshitBench measures whether models challenge broken premises or play along. We tested over 80 models from all major

Supporting Independent Research in AI Evaluation

Supporting Independent Research in AI Evaluation

Image Arena Improvements: New Categories & Quality Filtering

Image Arena Improvements: New Categories & Quality Filtering

Introducing Max

Introducing Max

LMArena is now Arena

LMArena is now Arena

Video Arena Is Live on Web

Video Arena Is Live on Web

Fueling the World’s Most Trusted AI Evaluation Platform

Fueling the World’s Most Trusted AI Evaluation Platform

Research

Supporting Independent Research in AI Evaluation

Supporting Independent Research in AI Evaluation

Arena’s Academic Partnerships Program provides funding and support for independent research advancing the scientific foundations of AI evaluation.

Introducing Max

Introducing Max

Studying the Frontier: Arena Expert

Studying the Frontier: Arena Expert

Arena's Ranking Method

Arena's Ranking Method

Arena Expert and Occupational Categories

Arena Expert and Occupational Categories

Re-introducing Vision Arena Categories

Re-introducing Vision Arena Categories

Introducing BiomedArena.AI: Evaluating LLMs for Biomedical Discovery

Introducing BiomedArena.AI: Evaluating LLMs for Biomedical Discovery

News

Supporting Independent Research in AI Evaluation

Supporting Independent Research in AI Evaluation

Arena’s Academic Partnerships Program provides funding and support for independent research advancing the scientific foundations of AI evaluation.

Introducing Max

Introducing Max

LMArena is now Arena

LMArena is now Arena

Video Arena Is Live on Web

Video Arena Is Live on Web

Fueling the World’s Most Trusted AI Evaluation Platform

Fueling the World’s Most Trusted AI Evaluation Platform

Arena's Ranking Method

Arena's Ranking Method

The Next Stage of AI Coding Evaluation Is Here

The Next Stage of AI Coding Evaluation Is Here