BenchLM Benchmarks
165 benchmarks · 480 model scores · Data from Jun 2, 2026
Reasoning19 benchmarks
bbh
3 models
1DeepSeek V4 Pro BaseDeepSeek87.5%
2DeepSeek V4 Flash BaseDeepSeek86.9%
3MiniCPM5-1BOpenBMB71.9%
lisan Bench
62 models
4GPT 5.4 (medium)openai/gpt-5.4:thinking-mediumClosed
6Gemini 3.1 Pro Preview (high)google/gemini-3.1-pro-preview:thinking-highClosed
7Grok 4 (medium)x-ai/grok-4:thinking-mediumClosed
8Grok 4.20 Beta (thinking)x-ai/grok-4.20-beta:thinkingClosed
9GPT 5 (medium)openai/gpt-5Closed
+57 morepp Bench
41 models
1GPT-5.5OpenAIgpt-5.5@xhigh
2GPT-5.4OpenAIgpt-5.4@xhigh
3GPT-5.2OpenAIgpt-5.2@xhigh
4Claude Opus 4.7claude-opus-4-7@thinkingClosed
5Gemini 3.5 Flashgemini-3.5-flash@highClosed
+36 morelong Bench V2
10 models
1Claude Opus 4.5Anthropic64.4%
2Qwen3.5 397BAlibaba63.2%
3Qwen3.6 PlusAlibaba62%
4Kimi K2.5Moonshot AI61%
5GLM-5Z.AI60.8%
+5 moremrcr1m
7 models
1DeepSeek V4 Pro (Max)DeepSeek83.5%
2DeepSeek V4 Pro (High)DeepSeek83.3%
3DeepSeek V4 Flash (Max)DeepSeek78.7%
4DeepSeek V4 Flash (High)DeepSeek76.9%
5DeepSeek V4 ProDeepSeek44.7%
+2 morecorpus Qa1m
6 models
1DeepSeek V4 Pro (Max)DeepSeek62.0%
2DeepSeek V4 Flash (Max)DeepSeek60.5%
3DeepSeek V4 Flash (High)DeepSeek59.3%
4DeepSeek V4 Pro (High)DeepSeek56.5%
5DeepSeek V4 ProDeepSeek35.6%
+1 morearc Agi2
11 models
1GPT-5.5OpenAI85%
2GPT-5.4 ProOpenAI83.3%
3Gemini 3.1 ProGoogle77.1%
4Claude Opus 4.7 (Adaptive)Anthropic75.8%
5Gemini 3.5 FlashGoogle72.1%
+6 moreai Needle
4 models
1Claude Opus 4.5Anthropic74%
2Qwen3.5 397BAlibaba68.7%
3Qwen3.6 PlusAlibaba68.3%
4GLM-5Z.AI63.3%
gpqa Diamond
29 models
1Gemini 3.1 ProGoogle94.3%
2Claude Opus 4.7 (Adaptive)Anthropic94.2%
3Claude Opus 4.8Anthropic93.6%
4GPT-5.5OpenAI93.6%
5GPT-5.4OpenAI92.8%
+24 morelcr
115 models
1GPT-5.2-CodexOpenAI75.7%
2GPT-5 (high)OpenAI75.6%
3GPT-5.1OpenAI75.0%
4GPT-5.5OpenAI74.3%
5GPT-5.4OpenAI74.0%
+110 morecritpt
116 models
1GPT-5.4 ProOpenAI30.0%
2GPT-5.5OpenAI27.1%
3Gemini 3 Pro Deep ThinkGoogle25.7%
4GPT-5.4OpenAI23.4%
5Gemini 3.1 ProGoogle17.7%
+111 morebullshit Bench V2
63 models
4Claude Opus 4.6 (high)anthropic/claude-opus-4.6@reasoning=highClosed
6Claude Opus 4.7 (none)anthropic/claude-opus-4.7@reasoning=noneClosed
7Claude Sonnet 4.5 (high)anthropic/claude-sonnet-4.5@reasoning=highClosed
9Qwen3.5 397B (Reasoning) (high)qwen/qwen3.5-397b-a17b@reasoning=highOpen
10Claude Haiku 4.5 (high)anthropic/claude-haiku-4.5@reasoning=highClosed
+58 more