| # | Model | Provider | Type | Score | |
|---|---|---|---|---|---|
| 1 | Claude Mythos Preview | Anthropic | Closed | 64.7% | |
| 2 | GPT-5.4 Pro | OpenAI | Closed | 58.7% | |
| 3 | Claude Opus 4.8 | Anthropic | Closed | 57.9% | |
| 4 | GPT-5.5 Pro | OpenAI | Closed | 57.2% | |
| 5 | Claude Opus 4.7 (Adaptive) | Anthropic | Closed | 54.7% | |
| 6 | Claude Opus 4.6 | Anthropic | Closed | 53% | |
| 7 | GLM-5.1 | Z.AI | Open | 52.3% | |
| 8 | GPT-5.5 | OpenAI | Closed | 52.2% | |
| 9 | GPT-5.4 | OpenAI | Closed | 52.1% | |
| 10 | GLM-5 | Z.AI | Open | 50.4% | |
| 11 | Muse Spark | Meta | Closed | 50.4% | |
| 12 | Claude Sonnet 4.6 | Anthropic | Closed | 49% | |
| 13 | MiMo-V2.5-Pro | Xiaomi | Closed | 48% | |
| 14 | GPT-5.4 mini | OpenAI | Closed | 41.5% | |
| 15 | Qwen3.7 Max | Alibaba | Closed | 41.4% | |
| 16 | Gemini 3.5 Flash | Closed | 40.2% | ||
| 17 | DeepSeek V4 Pro (Max) | DeepSeek | Open | 37.7% | |
| 18 | GPT-5.4 nano | OpenAI | Closed | 37.7% | |
| 19 | Grok 4.3 | xAI | Closed | 35% | |
| 20 | DeepSeek V4 Flash (Max) | DeepSeek | Open | 34.8% | |
| 21 | Kimi K2.6 | Moonshot AI | Open | 34.7% | |
| 22 | DeepSeek V4 Pro (High) | DeepSeek | Open | 34.5% | |
| 23 | Claude Opus 4.5 | Anthropic | Closed | 30.8% | |
| 24 | Kimi K2.5 | Moonshot AI | Open | 30.1% | |
| 25 | DeepSeek V4 Flash (High) | DeepSeek | Open | 29.4% | |
| 26 | Qwen3.6 Plus | Alibaba | Closed | 28.8% | |
| 27 | Qwen3.5 397B | Alibaba | Open | 28.7% | |
| 28 | Gemma 4 31B | Open | 26.5% | ||
| 29 | Hy3 Preview | Tencent | Open | 25.5% | |
| 30 | GLM-4.7 | Z.AI | Open | 24.8% | |
| 31 | Qwen3.6-27B | Alibaba | Open | 24% | |
| 32 | Qwen3.6-35B-A3B | Alibaba | Open | 21.4% | |
| 33 | Gemini 2.5 Pro | Closed | 18.8% | ||
| 34 | Gemma 4 26B A4B | Open | 17.2% | ||
| 35 | DeepSeek V4 Flash | DeepSeek | Open | 8.1% | |
| 36 | DeepSeek V4 Pro | DeepSeek | Open | 7.7% |