| # | Model | Provider | Type | Score | |
|---|---|---|---|---|---|
| 1 | GPT-5.5 Pro | OpenAI | Closed | 90.1% | |
| 2 | GPT-5.4 Pro | OpenAI | Closed | 89.3% | |
| 3 | Claude Mythos Preview | Anthropic | Closed | 86.9% | |
| 4 | GPT-5.5 | OpenAI | Closed | 84.4% | |
| 5 | Claude Opus 4.8 | Anthropic | Closed | 84.3% | |
| 6 | Claude Opus 4.6 | Anthropic | Closed | 83.7% | |
| 7 | MiniMax M3 | MiniMax | Open | 83.5% | |
| 8 | DeepSeek V4 Pro (Max) | DeepSeek | Open | 83.4% | |
| 9 | Kimi K2.6 | Moonshot AI | Open | 83.2% | |
| 10 | GPT-5.4 | OpenAI | Closed | 82.7% | |
| 11 | DeepSeek V4 Pro (High) | DeepSeek | Open | 80.4% | |
| 12 | Claude Opus 4.7 (Adaptive) | Anthropic | Closed | 79.3% | |
| 13 | Step 3.7 Flash | StepFun | Open | 75.8% | |
| 14 | DeepSeek V4 Flash (Max) | DeepSeek | Open | 73.2% | |
| 15 | GLM-5.1 | Z.AI | Open | 68% | |
| 16 | GPT-5.2 | OpenAI | Closed | 65.8% | |
| 17 | Qwen3.5-122B-A10B | Alibaba | Open | 63.8% | |
| 18 | Qwen3.5 397B | Alibaba | Open | 62% | |
| 19 | Qwen3.5-27B | Alibaba | Open | 61% | |
| 20 | Qwen3.5-35B-A3B | Alibaba | Open | 61% | |
| 21 | Kimi K2.5 (Reasoning) | Moonshot AI | Closed | 60.6% | |
| 22 | Kimi K2.5 | Moonshot AI | Open | 60.6% | |
| 23 | DeepSeek V4 Flash (High) | DeepSeek | Open | 53.5% | |
| 24 | GLM-4.7 | Z.AI | Open | 52% |