hmmt Feb2026
18 models evaluated
|
| 1 | Qwen3.7 Max | Alibaba | Closed | 97.1% | |
| 2 | DeepSeek V4 Pro (Max) | DeepSeek | Open | 95.2% | |
| 3 | DeepSeek V4 Flash (Max) | DeepSeek | Open | 94.8% | |
| 4 | DeepSeek V4 Pro (High) | DeepSeek | Open | 94.0% | |
| 5 | Kimi K2.6 | Moonshot AI | Open | 92.7% | |
| 6 | DeepSeek V4 Flash (High) | DeepSeek | Open | 91.9% | |
| 7 | Qwen3.5 397B | Alibaba | Open | 87.9% | |
| 8 | Qwen3.6 Plus | Alibaba | Closed | 87.8% | |
| 9 | Kimi K2.5 | Moonshot AI | Open | 87.1% | |
| 10 | GLM-5 | Z.AI | Open | 86.4% | |
| 11 | Claude Opus 4.5 | Anthropic | Closed | 85.3% | |
| 12 | Qwen3.6-27B | Alibaba | Open | 84.3% | |
| 13 | Qwen3.6-35B-A3B | Alibaba | Open | 83.6% | |
| 14 | GLM-5.1 | Z.AI | Open | 82.6% | |
| 15 | ZAYA1-8B | Zyphra | Open | 71.6% | |
| 16 | DeepSeek V4 Flash | DeepSeek | Open | 40.8% | |
| 17 | DeepSeek V4 Pro | DeepSeek | Open | 31.7% | |
| 18 | MiniCPM5-1B | OpenBMB | Open | 25.8% | |