super Gpqa
18 models evaluated
|
| 1 | Claude Opus 4.6 | Anthropic | Closed | 95% | |
| 2 | Claude Sonnet 4.6 | Anthropic | Closed | 95% | |
| 3 | Qwen 3.6 Max (preview) | Alibaba | Closed | 73.9% | |
| 4 | Qwen3.7 Max | Alibaba | Closed | 73.6% | |
| 5 | Qwen3.6 Plus | Alibaba | Closed | 71.6% | |
| 6 | Claude Opus 4.5 | Anthropic | Closed | 70.6% | |
| 7 | Qwen3.5 397B | Alibaba | Open | 70.4% | |
| 8 | Kimi K2.5 | Moonshot AI | Open | 69.2% | |
| 9 | Qwen3.5-122B-A10B | Alibaba | Open | 67.1% | |
| 10 | GLM-5 | Z.AI | Open | 66.8% | |
| 11 | Qwen3.6-27B | Alibaba | Open | 66% | |
| 12 | Qwen3.5-27B | Alibaba | Open | 65.6% | |
| 13 | Qwen3.6-35B-A3B | Alibaba | Open | 64.7% | |
| 14 | Qwen3.5-35B-A3B | Alibaba | Open | 63.4% | |
| 15 | Qwen3 235B 2507 | Alibaba | Open | 62.6% | |
| 16 | DeepSeek V4 Pro Base | DeepSeek | Open | 53.9% | |
| 17 | DeepSeek V4 Flash Base | DeepSeek | Open | 46.5% | |
| 18 | MiniCPM5-1B | OpenBMB | Open | 23.1% | |