| # | Model | Provider | Type | Score | |
|---|---|---|---|---|---|
| 1 | Gemini 3.5 Flash | Closed | 47.1% | ||
| 2 | GPT-5.5 | OpenAI | Closed | 37.7% | |
| 3 | GPT-5.4 | OpenAI | Closed | 33.3% | |
| 4 | Claude Opus 4.6 (Adaptive) | Anthropic | Closed | 33.0% | |
| 5 | Gemini 3.1 Pro | Closed | 32.0% | ||
| 6 | Kimi K2.6 | Moonshot AI | Open | 28.5% | |
| 7 | GPT-5.4 mini | OpenAI | Closed | 28.2% | |
| 8 | MiniMax M3 | MiniMax | Open | 27.7% | |
| 9 | GPT-5.4 nano | OpenAI | Closed | 24.9% | |
| 10 | DeepSeek V4 Pro (Max) | DeepSeek | Open | 24.3% | |
| 11 | Grok 4.3 | xAI | Closed | 17.0% | |
| 12 | Qwen3.5 397B (Reasoning) | Alibaba | Open | 15.3% | |
| 13 | GLM-5 | Z.AI | Open | 14.5% | |
| 14 | Gemini 3.1 Flash-Lite | Closed | 12.2% | ||
| 15 | Kimi K2.5 (Reasoning) | Moonshot AI | Closed | 11.5% | |
| 16 | Kimi K2.5 | Moonshot AI | Open | 11.5% | |
| 17 | MiniMax M2.7 | MiniMax | Open | 10.6% | |
| 18 | GPT-OSS 120B | OpenAI | Open | 3.1% | |
| 19 | MiMo-V2.5-Pro | Xiaomi | Closed | 2.4% | |
| 20 | GPT-OSS 20B | OpenAI | Open | 0.7% |