mmmu Pro
28 models evaluated
|
| 1 | GPT-5.4 Pro | OpenAI | Closed | 94% | |
| 2 | Claude Mythos Preview | Anthropic | Closed | 92.7% | |
| 3 | Gemini 3.1 Pro | Google | Closed | 83.9% | |
| 4 | Gemini 3.5 Flash | Google | Closed | 83.6% | |
| 5 | GPT-5.5 | OpenAI | Closed | 81.2% | |
| 6 | GPT-5.4 | OpenAI | Closed | 81.2% | |
| 7 | Gemini 3 Pro | Google | Closed | 81% | |
| 8 | Muse Spark | Meta | Closed | 80.4% | |
| 9 | GPT-5.2 | OpenAI | Closed | 79.5% | |
| 10 | Kimi K2.6 | Moonshot AI | Open | 79.4% | |
| 11 | Qwen3.5 397B | Alibaba | Open | 79% | |
| 12 | Qwen3.6 Plus | Alibaba | Closed | 78.8% | |
| 13 | Kimi K2.5 (Reasoning) | Moonshot AI | Closed | 78.5% | |
| 14 | Kimi K2.5 | Moonshot AI | Open | 78.5% | |
| 15 | MiniMax M3 | MiniMax | Open | 78.1% | |
| 16 | Grok 4.3 | xAI | Closed | 78.1% | |
| 17 | MiMo-V2.5 | Xiaomi | Closed | 77.9% | |
| 18 | Claude Opus 4.6 | Anthropic | Closed | 77.3% | |
| 19 | Gemma 4 31B | Google | Open | 76.9% | |
| 20 | GPT-5.4 mini | OpenAI | Closed | 76.6% | |
| 21 | Qwen3.6-27B | Alibaba | Open | 75.8% | |
| 22 | Qwen3.6-35B-A3B | Alibaba | Open | 75.3% | |
| 23 | Grok 4.20 | xAI | Closed | 75.2% | |
| 24 | Gemma 4 26B A4B | Google | Open | 73.8% | |
| 25 | Interfaze Beta | Interfaze | Closed | 71.1% | |
| 26 | Claude Opus 4.5 | Anthropic | Closed | 70.6% | |
| 27 | GPT-5.4 nano | OpenAI | Closed | 66.1% | |
| 28 | Command A+ | Cohere | Open | 63% | |