| # | Model | Provider | Type | Score | |
|---|---|---|---|---|---|
| 1 | Claude Mythos Preview | Anthropic | Closed | 56.8% | |
| 2 | Claude Opus 4.8 | Anthropic | Closed | 49.8% | |
| 3 | Claude Opus 4.7 (Adaptive) | Anthropic | Closed | 46.9% | |
| 4 | Gemini 3.1 Pro | Closed | 45.4% | ||
| 5 | GPT-5.5 Pro | OpenAI | Closed | 43.1% | |
| 6 | Muse Spark | Meta | Closed | 42.8% | |
| 7 | GPT-5.4 Pro | OpenAI | Closed | 42.7% | |
| 8 | GPT-5.5 | OpenAI | Closed | 41.4% | |
| 9 | Claude Opus 4.6 | Anthropic | Closed | 40% | |
| 10 | GPT-5.4 | OpenAI | Closed | 39.8% | |
| 11 | MiMo-V2.5-Pro | Xiaomi | Closed | 34% | |
| 12 | Grok 4.20 | xAI | Closed | 31.6% | |
| 13 | GPT-5.4 mini | OpenAI | Closed | 28.2% | |
| 14 | GPT-5.4 nano | OpenAI | Closed | 24.3% | |
| 15 | Gemma 4 31B | Open | 19.5% | ||
| 16 | Gemma 4 26B A4B | Open | 8.7% |