| # | Model | Provider | Type | Score | |
|---|---|---|---|---|---|
| 1 | Claude Mythos Preview | Anthropic | Closed | 77.8% | |
| 2 | Claude Opus 4.8 | Anthropic | Closed | 69.2% | |
| 3 | Claude Opus 4.7 (Adaptive) | Anthropic | Closed | 64.3% | |
| 4 | Qwen3.7 Max | Alibaba | Closed | 60.6% | |
| 5 | MiniMax M3 | MiniMax | Open | 59% | |
| 6 | GPT-5.5 | OpenAI | Closed | 58.6% | |
| 7 | Kimi K2.6 | Moonshot AI | Open | 58.6% | |
| 8 | GLM-5.1 | Z.AI | Open | 58.4% | |
| 9 | GPT-5.4 | OpenAI | Closed | 57.7% | |
| 10 | Qwen 3.6 Max (preview) | Alibaba | Closed | 57.3% | |
| 11 | MiMo-V2.5-Pro | Xiaomi | Closed | 57.2% | |
| 12 | Claude Opus 4.5 | Anthropic | Closed | 57.1% | |
| 13 | GPT-5.3 Codex | OpenAI | Closed | 56.8% | |
| 14 | Qwen3.6 Plus | Alibaba | Closed | 56.6% | |
| 15 | Step 3.7 Flash | StepFun | Open | 56.3% | |
| 16 | MiniMax M2.7 | MiniMax | Open | 56.2% | |
| 17 | MiMo-V2.5 | Xiaomi | Closed | 56.1% | |
| 18 | GPT-5.2 | OpenAI | Closed | 55.6% | |
| 19 | DeepSeek V4 Pro (Max) | DeepSeek | Open | 55.4% | |
| 20 | Gemini 3.5 Flash | Closed | 55.1% | ||
| 21 | GLM-5 | Z.AI | Open | 55.1% | |
| 22 | DeepSeek V4 Pro (High) | DeepSeek | Open | 54.4% | |
| 23 | Qwen3.6-27B | Alibaba | Open | 53.5% | |
| 24 | Claude Opus 4.6 | Anthropic | Closed | 53.4% | |
| 25 | DeepSeek V4 Flash (Max) | DeepSeek | Open | 52.6% | |
| 26 | Muse Spark | Meta | Closed | 52.4% | |
| 27 | DeepSeek V4 Flash (High) | DeepSeek | Open | 52.3% | |
| 28 | DeepSeek V4 Pro | DeepSeek | Open | 52.1% | |
| 29 | Grok 4.20 | xAI | Closed | 51.8% | |
| 30 | Qwen3.5 397B | Alibaba | Open | 50.9% | |
| 31 | Kimi K2.5 | Moonshot AI | Open | 50.7% | |
| 32 | Qwen3.6-35B-A3B | Alibaba | Open | 49.5% | |
| 33 | Laguna M.1 | Poolside | Closed | 49.2% | |
| 34 | DeepSeek V4 Flash | DeepSeek | Open | 49.1% | |
| 35 | Laguna XS.2 | Poolside | Open | 46.3% |