| # | Model | Provider | Type | Score | |
|---|---|---|---|---|---|
| 1 | Claude Opus 4.7 (Adaptive) | Anthropic | Closed | claude-opus-4-7 | |
| 2 | Claude Opus 4.6 | Anthropic | Closed | claude-opus-4-6 | |
| 3 | GPT-5.5 | OpenAI | Closed | 65.9% | |
| 4 | GPT-5.4 | OpenAI | Closed | 64.3% | |
| 5 | Claude Opus 4.5 | claude-opus-4-5 | Anthropic | Closed | |
| 6 | GPT-5.2 | OpenAI | Closed | 58.8% | |
| 7 | GPT-5.2-Codex | OpenAI | Closed | 58.3% | |
| 8 | GLM-5.1 | Z.AI | Open | 58.2% | |
| 9 | Gemini 3.1 Pro | Gemini-3.1-Pro | Closed | ||
| 10 | Claude Sonnet 4.5 | claude-sonnet-4-5 | Anthropic | Closed | |
| 11 | Qwen3.6 Plus | Qwen3.6-Plus | Alibaba | Closed | |
| 12 | GLM-5 | Z.AI | Open | 49.4% | |
| 13 | Kimi K2.5 | Kimi-K2.5 | Moonshot AI | Open | |
| 14 | Gemini 3 Pro | Gemini-3-Pro | Closed | ||
| 15 | Gemini 3 Flash | Gemini-3-Flash | Closed | ||
| 16 | DeepSeek V3.2 (Thinking) | DeepSeek-V3.2-Reasoner | DeepSeek | Open | |
| 17 | MiniMax M2.5 | MiniMax-M2.5 | MiniMax | Closed | |
| 18 | Claude Sonnet 4.6 | claude-sonnet-4-6 | Anthropic | Closed | |
| 19 | MiniMax M2.7 | Minimax-2.7 | MiniMax | Open | |
| 20 | GLM-4.7 | Z.AI | Open | 42.3% | |
| 22 | Kimi K2.5 (Reasoning) | Kimi-K2-Thinking | Moonshot AI | Closed | |
| 23 | Qwen3.5 Flash | Qwen3.5-Flash | Alibaba | Closed | |
| 24 | Nemotron 3 Super 120B A12B | Nemotron-3-Super | NVIDIA | Open | |
| 27 | Kimi K2.6 | Kimi-K2.6 | Moonshot AI | Open | |
| 28 | Trinity-Large-Thinking | Arcee AI | Open | 25.4% |