| # | Model | Provider | Type | Score | |
|---|---|---|---|---|---|
| 1 | Claude Mythos Preview | Anthropic | Closed | 93.9% | |
| 2 | Claude Opus 4.8 | Anthropic | Closed | 88.6% | |
| 3 | Claude Opus 4.7 (Adaptive) | Anthropic | Closed | 87.6% | |
| 4 | GPT-5.3 Codex | OpenAI | Closed | 85% | |
| 5 | Claude Opus 4.5 | Anthropic | Closed | 80.9% | |
| 6 | Claude Opus 4.6 | Anthropic | Closed | 80.8% | |
| 7 | DeepSeek V4 Pro (Max) | DeepSeek | Open | 80.6% | |
| 8 | MiniMax M3 | MiniMax | Open | 80.5% | |
| 9 | Qwen3.7 Max | Alibaba | Closed | 80.4% | |
| 10 | Kimi K2.6 | Moonshot AI | Open | 80.2% | |
| 11 | GPT-5.2 | OpenAI | Closed | 80% | |
| 12 | Claude Sonnet 4.6 | Anthropic | Closed | 79.6% | |
| 13 | DeepSeek V4 Pro (High) | DeepSeek | Open | 79.4% | |
| 14 | DeepSeek V4 Flash (Max) | DeepSeek | Open | 79% | |
| 15 | Qwen3.6 Plus | Alibaba | Closed | 78.8% | |
| 16 | DeepSeek V4 Flash (High) | DeepSeek | Open | 78.6% | |
| 17 | MiMo-V2-Pro | Xiaomi | Closed | 78% | |
| 18 | GLM-5 | Z.AI | Open | 77.8% | |
| 19 | Mistral Medium 3.5 128B | Mistral | Open | 77.6% | |
| 20 | Muse Spark | Meta | Closed | 77.4% | |
| 21 | Qwen3.6-27B | Alibaba | Open | 77.2% | |
| 22 | Claude Sonnet 4.5 | Anthropic | Closed | 77.2% | |
| 23 | Kimi K2.5 (Reasoning) | Moonshot AI | Closed | 76.8% | |
| 24 | Kimi K2.5 | Moonshot AI | Open | 76.8% | |
| 25 | Grok 4.20 | xAI | Closed | 76.7% | |
| 26 | Qwen3.5 397B | Alibaba | Open | 76.2% | |
| 27 | MiMo-V2-Omni | Xiaomi | Closed | 74.8% | |
| 28 | Laguna M.1 | Poolside | Closed | 74.6% | |
| 29 | Claude 4.1 Opus | Anthropic | Closed | 74.5% | |
| 30 | Hy3 Preview | Tencent | Open | 74.4% | |
| 31 | GLM-4.7 | Z.AI | Open | 73.8% | |
| 32 | DeepSeek V4 Flash | DeepSeek | Open | 73.7% | |
| 33 | DeepSeek V4 Pro | DeepSeek | Open | 73.6% | |
| 34 | Qwen3.6-35B-A3B | Alibaba | Open | 73.4% | |
| 35 | MiMo-V2-Flash | Xiaomi | Open | 73.4% | |
| 36 | Claude Haiku 4.5 | Anthropic | Closed | 73.3% | |
| 37 | Claude 4 Sonnet | Anthropic | Closed | 72.7% | |
| 38 | Qwen3.5-27B | Alibaba | Open | 72.4% | |
| 39 | Qwen3.5-122B-A10B | Alibaba | Open | 72% | |
| 40 | Grok Code Fast 1 | xAI | Closed | 70.8% | |
| 41 | Laguna XS.2 | Poolside | Open | 69.9% | |
| 42 | Qwen3.5-35B-A3B | Alibaba | Open | 69.2% | |
| 43 | Gemini 2.5 Pro | Closed | 63.8% | ||
| 44 | GPT-4.1 | OpenAI | Closed | 54.6% | |
| 45 | ZAYA1-74B-Preview | Zyphra | Open | 53.2% | |
| 46 | o3-mini | OpenAI | Closed | 49.3% | |
| 47 | Claude 3.5 Sonnet | Anthropic | Closed | 49% | |
| 48 | DeepSeek V3 | DeepSeek | Open | 42% | |
| 49 | GPT-4.1 mini | OpenAI | Closed | 23.6% |