Code Arena | Overall

View overall rankings across AI models on agentic coding tasks involving multi-step reasoning and tool use.

Mar 20, 2026

209,727 votes

56 models

	Rank Spread
1	12	claude-opus-4-6 Anthropic · Proprietary	1548+12/-12	4,059	$5 / $25	1M
2	12	claude-opus-4-6-thinking Anthropic · Proprietary	1546+12/-12	3,317	$5 / $25	1M
3	33	claude-sonnet-4-6 Anthropic · Proprietary	1521+9/-9	5,876	$3 / $15	1M
4	44	claude-opus-4-5-20251101-thinking-32k Anthropic · Proprietary	1489+7/-7	13,259	$5 / $25	200K
5	58	claude-opus-4-5-20251101 Anthropic · Proprietary	1465+7/-7	13,313	$5 / $25	200K
6	514	gpt-5.4-high (codex-harness) OpenAI · Proprietary	1457+17/-17	1,486	N/A	N/A
7	512	gemini-3.1-pro-preview Google · Proprietary	1454+10/-10	4,364	$2 / $12	1M
8	615	glm-5 Z.ai · MIT	1445+10/-10	4,316	$1 / $3.20	202.8K
9	515	minimax-m2.7 MiniMax · Proprietary	1445+14/-14	2,015	$0.30 / $1.20	204.8K
10	615	glm-4.7 Z.ai · MIT	1439+10/-10	4,971	$0.39 / $1.75	202.8K
11	615	gemini-3-pro Google · Proprietary	1437+7/-7	17,483	$2 / $12	1M
12	715	gemini-3-flash Google · Proprietary	1436+7/-7	13,404	$0.50 / $3	1M
13	616	mimo-v2-pro Xiaomi · Proprietary	1436+16/-16	1,350	$1 / $3	1M
14	815	kimi-k2.5-thinking Moonshot · Modified MIT	1431+9/-9	5,987	$0.60 / $3	N/A
15	719	gpt-5.4-medium (codex-harness) OpenAI · Proprietary	1428+16/-16	1,574	N/A	N/A
16	1522	minimax-m2.5 MiniMax · Modified MIT	1410+9/-9	5,796	$0.20 / $1.17	196.6K
17	1522	kimi-k2.5-instant Moonshot · Modified MIT	1409+11/-11	3,632	$0.45 / $2.20	262.1K
18	1423	gpt-5.3-codex (codex-harness) OpenAI · Proprietary	1409+12/-12	2,973	$1.75 / $14	400K
19	1528	gpt-5.2 OpenAI · Proprietary	1400+16/-16	1,531	$1.75 / $14	400K
20	1627	minimax-m2.1-preview MiniMax · MIT	1399+8/-8	9,584	$0.27 / $0.95	196.6K
21	1627	gemini-3-flash (thinking-minimal) Google · Proprietary	1395+7/-7	11,042	$0.50 / $3	1M
22	1628	gpt-5-medium OpenAI · Proprietary	1392+12/-12	3,835	$1.25 / $10	400K
23	1928	claude-sonnet-4-5-20250929-thinking-32k Anthropic · Proprietary	1389+6/-6	16,012	$3 / $15	200K
24	1828	gpt-5.1-medium OpenAI · Proprietary	1388+9/-9	6,255	$1.25 / $10	400K
25	1930	qwen3.5-397b-a17b Alibaba · Apache 2.0	1386+10/-10	4,535	$0.39 / $2.34	262.1K
26	1928	claude-sonnet-4-5-20250929 Anthropic · Proprietary	1386+6/-6	17,832	$3 / $15	200K
27	1930	claude-opus-4-1-20250805 Anthropic · Proprietary	1384+9/-9	8,738	$15 / $75	200K
28	2132	grok-4.20-beta-0309-reasoning xAI · Proprietary	1373+14/-14	1,941	$2 / $6	2M
29	2632	deepseek-v3.2-thinking DeepSeek · MIT	1370+8/-8	7,445	$0.26 / $0.38	163.8K
30	2632	qwen3.5-122b-a10b Alibaba · Apache 2.0	1367+11/-11	3,239	$0.26 / $2.08	262.1K
31	2834	glm-4.6 Z.ai · MIT	1354+9/-9	8,522	$0.39 / $1.90	204.8K
32	2835	qwen3.5-27b Alibaba · Apache 2.0	1352+12/-12	2,951	$0.20 / $1.56	262.1K
33	3137	gpt-5.1 OpenAI · Proprietary	1339+7/-7	13,088	$1.25 / $10	400K
34	3137	mimo-v2-flash (non-thinking) Xiaomi · MIT	1338+8/-8	6,850	$0.09 / $0.29	262.1K
35	3238	gpt-5.2-codex OpenAI · Proprietary	1338+8/-8	7,901	$1.75 / $14	400K
36	3338	kimi-k2-thinking-turbo Moonshot · Modified MIT	1328+6/-6	14,436	$1.15 / $8	262.1K
37	3339	gpt-5.1-codex OpenAI · Proprietary	1326+9/-9	6,346	$1.25 / $10	400K
38	3541	deepseek-v3.2 DeepSeek · MIT	1322+8/-8	8,886	$0.26 / $0.38	163.8K
39	3841	claude-haiku-4-5-20251001 Anthropic · Proprietary	1309+6/-6	15,758	$1 / $5	200K
40	3741	minimax-m2 MiniMax · Apache 2.0	1309+9/-9	8,602	$0.26 / $1	196.6K
41	3843	mimo-v2-flash (thinking) Xiaomi · MIT	1302+14/-14	2,109	$0.09 / $0.29	262.1K
42	4143	deepseek-v3.2-exp DeepSeek · MIT	1285+10/-11	5,012	$0.27 / $0.41	163.8K
43	4143	qwen3-coder-480b-a35b-instruct Alibaba · Apache 2.0	1282+6/-6	15,471	$0.40 / $1.60	262.1K
44	4449	KAT-Coder-Pro-V1 KwaiKAT · Proprietary	1258+15/-15	1,925	$0.21 / $0.83	256K
45	4450	gemini-3.1-flash-lite-preview Google · Proprietary	1251+16/-16	1,479	$0.25 / $1.50	1M
46	4450	qwen3.5-35b-a3b Alibaba · Apache 2.0	1249+16/-16	1,818	$0.16 / $1.30	262.1K
47	4451	gpt-5.1-codex-mini OpenAI · Proprietary	1240+17/-17	1,503	$0.25 / $2	400K
48	4451	qwen3.5-flash Alibaba · Proprietary	1238+17/-17	1,560	N/A	N/A
49	4450	grok-4-1-fast-reasoning xAI · Proprietary	1234+9/-9	6,977	$0.20 / $0.50	2M
50	4553	mistral-large-3 Mistral · Apache 2.0	1221+20/-20	1,031	$0.50 / $1.50	N/A
51	4853	grok-4.1-thinking xAI · Proprietary	1205+19/-19	1,242	$0.20 / $0.50	N/A
52	5053	gemini-2.5-pro Google · Proprietary	1205+13/-13	3,365	$1.25 / $10	1M
53	5053	devstral-2 Mistral · Modified MIT	1198+17/-17	1,603	N/A	N/A
54	5455	grok-4-fast-reasoning xAI · Proprietary	1149+23/-23	936	$0.20 / $0.50	2M
55	5456	grok-code-fast-1 xAI · Proprietary	1138+22/-22	989	$0.20 / $1.50	256K
56	5556	devstral-medium-2507 Mistral · Proprietary	1094+22/-22	1,003	$0.40 / $2	128K

Code Arena | Overall

Remove Style Control Leaderboard Plots

Fraction of Model A Wins for All Non-tied A vs. B Battles

Confidence Intervals on Model Strength (via Bootstrapping)

Battle Count for Each Combination of Models (without Ties)

Average Win Rate Against All Other Models (Uniform Sampling and No Ties)