| # | Model | Provider | Type | Score | |
|---|---|---|---|---|---|
| 1 | Claude Mythos Preview | Anthropic | Closed | 94.5% | |
| 2 | Claude Opus 4.7 (Adaptive) | Anthropic | Closed | 94.2% | |
| 3 | Claude Opus 4.8 | Anthropic | Closed | 93.6% | |
| 4 | GPT-5.5 | OpenAI | Closed | 93.6% | |
| 5 | GPT-5.4 | OpenAI | Closed | 92.8% | |
| 6 | Qwen3.7 Max | Alibaba | Closed | 92.4% | |
| 7 | GPT-5.2 | OpenAI | Closed | 92.4% | |
| 8 | Gemini 3.5 Flash | Closed | 92.2% | ||
| 9 | Claude Opus 4.6 | Anthropic | Closed | 91.3% | |
| 10 | Kimi K2.6 | Moonshot AI | Open | 90.5% | |
| 11 | Qwen3.6 Plus | Alibaba | Closed | 90.4% | |
| 12 | DeepSeek V4 Pro (Max) | DeepSeek | Open | 90.1% | |
| 13 | Grok 4.3 | xAI | Closed | 90.1% | |
| 14 | Claude Sonnet 4.6 | Anthropic | Closed | 89.9% | |
| 15 | Interfaze Beta | Interfaze | Closed | 89.9% | |
| 16 | DeepSeek V4 Pro (High) | DeepSeek | Open | 89.1% | |
| 17 | Qwen3.5 397B | Alibaba | Open | 88.4% | |
| 18 | DeepSeek V4 Flash (Max) | DeepSeek | Open | 88.1% | |
| 19 | GPT-5.4 mini | OpenAI | Closed | 88% | |
| 20 | Qwen3.6-27B | Alibaba | Open | 87.8% | |
| 21 | Kimi K2.5 (Reasoning) | Moonshot AI | Closed | 87.6% | |
| 22 | Kimi K2.5 | Moonshot AI | Open | 87.6% | |
| 23 | DeepSeek V4 Flash (High) | DeepSeek | Open | 87.4% | |
| 24 | Hy3 Preview | Tencent | Open | 87.2% | |
| 25 | Claude Opus 4.5 | Anthropic | Closed | 87% | |
| 26 | Qwen3.5-122B-A10B | Alibaba | Open | 86.6% | |
| 27 | GLM-5 | Z.AI | Open | 86% | |
| 28 | Qwen3.6-35B-A3B | Alibaba | Open | 86% | |
| 29 | GLM-4.7 | Z.AI | Open | 85.7% | |
| 30 | Qwen3.5-27B | Alibaba | Open | 85.5% | |
| 31 | Gemma 4 31B | Open | 84.3% | ||
| 32 | Qwen3.5-35B-A3B | Alibaba | Open | 84.2% | |
| 33 | MiMo-V2-Flash | Xiaomi | Open | 83.7% | |
| 34 | Claude Sonnet 4.5 | Anthropic | Closed | 83.4% | |
| 35 | Gemini 2.5 Pro | Closed | 83% | ||
| 36 | GPT-5.4 nano | OpenAI | Closed | 82.8% | |
| 37 | o1-pro | OpenAI | Closed | 79% | |
| 38 | Qwen3 235B 2507 | Alibaba | Open | 77.5% | |
| 39 | o3-mini | OpenAI | Closed | 77.2% | |
| 40 | o1 | OpenAI | Closed | 75.7% | |
| 41 | DeepSeek V4 Pro | DeepSeek | Open | 72.9% | |
| 42 | Nemotron 3 Nano Omni 30B A3B | NVIDIA | Open | 72.2% | |
| 43 | DeepSeek V4 Flash | DeepSeek | Open | 71.2% | |
| 44 | ZAYA1-8B | Zyphra | Open | 71% | |
| 45 | GPT-4.1 | OpenAI | Closed | 66.3% | |
| 46 | GPT-4.1 mini | OpenAI | Closed | 64.2% | |
| 47 | Claude 3.5 Sonnet | Anthropic | Closed | 59.4% | |
| 48 | DeepSeek V3 | DeepSeek | Open | 59.1% | |
| 49 | Ling 2.6 Flash | InclusionAI | Open | 59% | |
| 50 | Gemma 4 E4B | Open | 58.6% | ||
| 51 | ZAYA1-74B-Preview | Zyphra | Open | 57.3% | |
| 52 | GPT-4.1 nano | OpenAI | Closed | 50.3% | |
| 53 | Gemma 4 E2B | Open | 43.4% | ||
| 54 | LFM2.5-VL-450M | LiquidAI | Open | 25.7% |