| # | Model | Provider | Type | Score | |
|---|---|---|---|---|---|
| 1 | Claude Opus 4.5 | Anthropic | Closed | 64.4% | |
| 2 | Qwen3.5 397B | Alibaba | Open | 63.2% | |
| 3 | Qwen3.6 Plus | Alibaba | Closed | 62% | |
| 4 | Kimi K2.5 | Moonshot AI | Open | 61% | |
| 5 | GLM-5 | Z.AI | Open | 60.8% | |
| 6 | Qwen3.5-27B | Alibaba | Open | 60.6% | |
| 7 | Qwen3.5-122B-A10B | Alibaba | Open | 60.2% | |
| 8 | Qwen3.5-35B-A3B | Alibaba | Open | 59% | |
| 9 | DeepSeek V4 Pro Base | DeepSeek | Open | 51.5% | |
| 10 | DeepSeek V4 Flash Base | DeepSeek | Open | 44.7% |