| # | Model | Provider | Type | Score | |
|---|---|---|---|---|---|
| 1 | Claude Opus 4.8 | Anthropic | Closed | 83.4% | |
| 2 | Holo3-35B-A3B | H Company | Open | 82.6% | |
| 3 | Claude Mythos Preview | Anthropic | Closed | 79.6% | |
| 4 | Holo3-122B-A10B | H Company | Closed | 78.8% | |
| 5 | GPT-5.5 | OpenAI | Closed | 78.7% | |
| 6 | Gemini 3.5 Flash | Closed | 78.4% | ||
| 7 | Claude Opus 4.7 (Adaptive) | Anthropic | Closed | 78% | |
| 8 | GPT-5.4 | OpenAI | Closed | 75% | |
| 9 | Kimi K2.6 | Moonshot AI | Open | 73.1% | |
| 10 | Claude Opus 4.6 | Anthropic | Closed | 72.7% | |
| 11 | Claude Sonnet 4.6 | Anthropic | Closed | 72.1% | |
| 12 | GPT-5.4 mini | OpenAI | Closed | 72.1% | |
| 13 | MiniMax M3 | MiniMax | Open | 70.1% | |
| 14 | Claude Opus 4.5 | Anthropic | Closed | 66.3% | |
| 15 | GPT-5.3 Codex | OpenAI | Closed | 64.7% | |
| 16 | Claude Sonnet 4.5 | Anthropic | Closed | 61.4% | |
| 17 | Qwen3.5-122B-A10B | Alibaba | Open | 58% | |
| 18 | Qwen3.5-27B | Alibaba | Open | 56.2% | |
| 19 | Qwen3.5-35B-A3B | Alibaba | Open | 54.5% | |
| 20 | GPT-5.2 | OpenAI | Closed | 47.3% | |
| 21 | GPT-5.4 nano | OpenAI | Closed | 39% |