| # | Model | Provider | Type | Score | |
|---|---|---|---|---|---|
| 1 | Claude Mythos Preview | Anthropic | Closed | 93.2% | |
| 2 | Claude Opus 4.7 (Adaptive) | Anthropic | Closed | 91% | |
| 3 | Claude Opus 4.8 | Anthropic | Closed | 89.9% | |
| 4 | Muse Spark | Meta | Closed | 86.4% | |
| 5 | Gemini 3.5 Flash | Closed | 84.2% | ||
| 6 | GPT-5.4 | OpenAI | Closed | 82.8% | |
| 7 | GPT-5.2 | OpenAI | Closed | 82.1% | |
| 8 | Qwen3.6 Plus | Alibaba | Closed | 81.5% | |
| 9 | Gemini 3 Pro | Closed | 81.4% | ||
| 10 | MiMo-V2.5 | Xiaomi | Closed | 81% | |
| 11 | Qwen3.5 397B | Alibaba | Open | 80.8% | |
| 12 | Kimi K2.6 | Moonshot AI | Open | 80.4% | |
| 13 | Gemini 3.1 Pro | Closed | 80.2% | ||
| 14 | Qwen3.6-27B | Alibaba | Open | 78.4% | |
| 15 | Qwen3.6-35B-A3B | Alibaba | Open | 78% | |
| 16 | Claude Sonnet 4.6 | Anthropic | Closed | 77.4% | |
| 17 | Qwen3.5-122B-A10B | Alibaba | Open | 77.2% | |
| 18 | Nemotron 3 Nano Omni 30B A3B | NVIDIA | Open | 76.3% | |
| 19 | Gemini 3.1 Flash-Lite | Closed | 73.2% | ||
| 20 | Claude Opus 4.5 | Anthropic | Closed | 68.5% | |
| 21 | Grok 4.20 | xAI | Closed | 60.9% | |
| 22 | Command A+ | Cohere | Open | 52.7% |