| # | Model | Provider | Type | Score | |
|---|---|---|---|---|---|
| 1 | Qwen3.7 Max | Alibaba | Closed | 89.6% | |
| 2 | Claude Opus 4.5 | Anthropic | Closed | 89.5% | |
| 3 | Qwen3.6 Plus | Alibaba | Closed | 88.5% | |
| 4 | Qwen3.5 397B | Alibaba | Open | 87.8% | |
| 5 | DeepSeek V4 Pro (Max) | DeepSeek | Open | 87.5% | |
| 6 | DeepSeek V4 Pro (High) | DeepSeek | Open | 87.1% | |
| 7 | Kimi K2.5 (Reasoning) | Moonshot AI | Closed | 87.1% | |
| 8 | Kimi K2.5 | Moonshot AI | Open | 87.1% | |
| 9 | Qwen3.5-122B-A10B | Alibaba | Open | 86.7% | |
| 10 | DeepSeek V4 Flash (High) | DeepSeek | Open | 86.4% | |
| 11 | DeepSeek V4 Flash (Max) | DeepSeek | Open | 86.2% | |
| 12 | Qwen3.6-27B | Alibaba | Open | 86.2% | |
| 13 | Qwen3.5-27B | Alibaba | Open | 86.1% | |
| 14 | GLM-5 | Z.AI | Open | 85.7% | |
| 15 | Qwen3.5-35B-A3B | Alibaba | Open | 85.3% | |
| 16 | Qwen3.6-35B-A3B | Alibaba | Open | 85.2% | |
| 17 | Gemma 4 31B | Open | 85.2% | ||
| 18 | MiMo-V2-Flash | Xiaomi | Open | 84.9% | |
| 19 | GLM-4.7 | Z.AI | Open | 84.3% | |
| 20 | DeepSeek V4 Flash | DeepSeek | Open | 83% | |
| 21 | Qwen3 235B 2507 | Alibaba | Open | 83% | |
| 22 | DeepSeek V4 Pro | DeepSeek | Open | 82.9% | |
| 23 | Gemma 4 26B A4B | Open | 82.6% | ||
| 24 | Claude Opus 4.6 | Anthropic | Closed | 82% | |
| 25 | Exaone 4.0 32B | LG AI Research | Open | 81.8% | |
| 26 | Claude Sonnet 4.6 | Anthropic | Closed | 79.2% | |
| 27 | Nemotron 3 Nano Omni 30B A3B | NVIDIA | Open | 77.3% | |
| 28 | DeepSeek V3 | DeepSeek | Open | 75.9% | |
| 29 | ZAYA1-8B | Zyphra | Open | 74.2% | |
| 30 | DeepSeek V4 Pro Base | DeepSeek | Open | 73.5% | |
| 31 | Gemma 4 E4B | Open | 69.4% | ||
| 32 | DeepSeek V4 Flash Base | DeepSeek | Open | 68.3% | |
| 33 | ZAYA1-74B-Preview | Zyphra | Open | 68.1% | |
| 34 | Gemma 4 E2B | Open | 60% | ||
| 35 | MiniCPM5-1B | OpenBMB | Open | 48.9% | |
| 36 | LFM2.5-VL-450M | LiquidAI | Open | 19.3% |