| # | Model | Provider | Type | Score | |
|---|---|---|---|---|---|
| 1 | Gemini 3.5 Flash | Closed | 83.6% | ||
| 2 | Claude Opus 4.8 | Anthropic | Closed | 82.2% | |
| 3 | Claude Opus 4.7 (Adaptive) | Anthropic | Closed | 77.3% | |
| 4 | Qwen3.7 Max | Alibaba | Closed | 76.4% | |
| 5 | GPT-5.5 | OpenAI | Closed | 75.3% | |
| 6 | DeepSeek V4 Pro (High) | DeepSeek | Open | 74.2% | |
| 7 | MiniMax M3 | MiniMax | Open | 74.2% | |
| 8 | DeepSeek V4 Pro (Max) | DeepSeek | Open | 73.6% | |
| 9 | GLM-5.1 | Z.AI | Open | 71.8% | |
| 10 | GPT-5.4 | OpenAI | Closed | 70.6% | |
| 11 | DeepSeek V4 Pro | DeepSeek | Open | 69.4% | |
| 12 | DeepSeek V4 Flash (Max) | DeepSeek | Open | 69% | |
| 13 | DeepSeek V4 Flash (High) | DeepSeek | Open | 67.4% | |
| 14 | DeepSeek V4 Flash | DeepSeek | Open | 64% | |
| 15 | Qwen3.6-35B-A3B | Alibaba | Open | 62.8% | |
| 16 | GPT-5.4 mini | OpenAI | Closed | 57.7% | |
| 17 | GPT-5.4 nano | OpenAI | Closed | 56.1% | |
| 18 | Kimi K2.6 | Moonshot AI | Open | 55.9% | |
| 19 | Qwen3.6 Plus | Alibaba | Closed | 48.2% | |
| 20 | Qwen3.5 397B | Alibaba | Open | 46.1% | |
| 21 | Claude Opus 4.5 | Anthropic | Closed | 42.3% | |
| 22 | GLM-5 | Z.AI | Open | 31.1% | |
| 23 | Kimi K2.5 | Moonshot AI | Open | 29.5% |