| # | Model | Provider | Type | Score | |
|---|---|---|---|---|---|
| 1 | GPT-5.5 | OpenAI | Closed | 82.0% | |
| 2 | Gemini 3.5 Flash | Closed | 76.2% | ||
| 3 | Claude Opus 4.8 | Anthropic | Closed | 74.6% | |
| 4 | Qwen3.7 Max | Alibaba | Closed | 69.7% | |
| 5 | Claude Opus 4.7 (Adaptive) | Anthropic | Closed | 69.4% | |
| 6 | Composer 2.5 | Cursor | Closed | 69.3% | |
| 7 | MiMo-V2.5-Pro | Xiaomi | Closed | 68.4% | |
| 8 | DeepSeek V4 Pro (Max) | DeepSeek | Open | 67.9% | |
| 9 | Kimi K2.6 | Moonshot AI | Open | 66.7% | |
| 10 | MiniMax M3 | MiniMax | Open | 66.0% | |
| 11 | MiMo-V2.5 | Xiaomi | Closed | 65.8% | |
| 12 | Qwen 3.6 Max (preview) | Alibaba | Closed | 65.4% | |
| 13 | DeepSeek V4 Pro (High) | DeepSeek | Open | 63.3% | |
| 14 | Composer 2 | Cursor | Closed | 61.7% | |
| 15 | Step 3.7 Flash | StepFun | Open | 59.5% | |
| 16 | Qwen3.6-27B | Alibaba | Open | 59.3% | |
| 17 | DeepSeek V4 Pro | DeepSeek | Open | 59.1% | |
| 18 | DeepSeek V4 Flash (Max) | DeepSeek | Open | 56.9% | |
| 19 | DeepSeek V4 Flash (High) | DeepSeek | Open | 56.6% | |
| 20 | Hy3 Preview | Tencent | Open | 54.4% | |
| 21 | Qwen3.6-35B-A3B | Alibaba | Open | 51.5% | |
| 22 | DeepSeek V4 Flash | DeepSeek | Open | 49.1% | |
| 23 | Laguna M.1 | Poolside | Closed | 45.8% | |
| 24 | Laguna XS.2 | Poolside | Open | 35.7% |