gert Labs
54 models evaluated
|
| 1 | Claude Opus 4.8 | Anthropic | Closed | 72.97% | |
| 2 | GPT-5.5 | OpenAI | Closed | 72.93% | |
| 3 | Claude Opus 4.7 | Anthropic | Closed | 65.59% | |
| 4 | GPT-5.4 | OpenAI | Closed | 64.89% | |
| 5 | Qwen3.7 Max | Alibaba | Closed | 64.27% | |
| 6 | Claude Opus 4.5 | Anthropic | Closed | 64.23% | |
| 7 | Gemini 3 Pro | Google | Closed | 63.23% | |
| 8 | Claude Sonnet 4.6 | Anthropic | Closed | 62.92% | |
| 9 | MiMo-V2.5-Pro | Xiaomi | Closed | 62.70% | |
| 10 | Claude Opus 4.6 | Anthropic | Closed | 61.85% | |
| 11 | Gemini 3.5 Flash | Google | Closed | 61.85% | |
| 12 | GLM-5.1 | Z.AI | Open | 60.11% | |
| 13 | GPT-5.3 Codex | OpenAI | Closed | 57.47% | |
| 14 | Gemini 3.1 Pro | Google | Closed | 56.87% | |
| 15 | Kimi K2.6 | Moonshot AI | Open | 56.82% | |
| 16 | Gemini 3 Flash | Google | Closed | 56.63% | |
| 17 | Qwen3.6-27B | Alibaba | Open | 54.84% | |
| 18 | DeepSeek V4 Flash | DeepSeek | Open | 54.35% | |
| 19 | GPT-5.2-Codex | OpenAI | Closed | 51.79% | |
| 20 | Step 3.7 Flash | StepFun | Open | 51.57% | |
| 21 | GLM-5 | Z.AI | Open | 50.99% | |
| 22 | Qwen3.6 Plus | Alibaba | Closed | 50.60% | |
| 23 | DeepSeek V4 Pro | DeepSeek | Open | 50.28% | |
| 24 | GPT-5.1-Codex | OpenAI | Closed | 49.68% | |
| 25 | Grok Build 0.1 | xAI | Closed | 49.15% | |
| 26 | Claude Sonnet 4.5 | Anthropic | Closed | 48.51% | |
| 27 | Grok 4.1 Fast | xAI | Closed | 47.32% | |
| 28 | MiMo-V2.5 | Xiaomi | Closed | 46.89% | |
| 29 | Qwen3.5 397B | Alibaba | Open | 46.76% | |
| 30 | GPT-5.2 | OpenAI | Closed | 46.54% | |
| 31 | Kimi K2.5 | Moonshot AI | Open | 45.88% | |
| 32 | Grok 4.3 | xAI | Closed | 43.86% | |
| 33 | Qwen3 Max | Alibaba | Closed | 43.74% | |
| 34 | Qwen3.6-35B-A3B | Alibaba | Open | 42.65% | |
| 35 | Grok 4 | xAI | Closed | 42.34% | |
| 36 | Gemini 2.5 Pro | Google | Closed | 42.01% | |
| 37 | GPT-5.1 | OpenAI | Closed | 41.24% | |
| 38 | MiniMax M2.7 | MiniMax | Open | 40.40% | |
| 39 | GLM-4.7 | Z.AI | Open | 39.95% | |
| 40 | Claude 4 Sonnet | Anthropic | Closed | 39.66% | |
| 41 | Qwen3.5-27B | Alibaba | Open | 39.41% | |
| 42 | Mistral Medium 3.5 128B | Mistral | Open | 39.10% | |
| 43 | Gemini 3.1 Flash-Lite | Google | Closed | 38.46% | |
| 44 | Grok 4.20 | xAI | Closed | 38.36% | |
| 45 | Hy3 Preview | Tencent | Open | 36.91% | |
| 46 | MiMo-V2-Pro | Xiaomi | Closed | 36.68% | |
| 47 | Gemma 4 31B | Google | Open | 35.26% | |
| 48 | Kimi K2.5 (Reasoning) | Moonshot AI | Closed | 32.58% | |
| 49 | Trinity-Large-Thinking | Arcee AI | Open | 32.55% | |
| 50 | GLM-5V-Turbo | Z.AI | Closed | 30.76% | |
| 51 | GPT-OSS 120B | OpenAI | Open | 29.61% | |
| 52 | DeepSeek V3.2 | DeepSeek | Open | 29.57% | |
| 53 | Qwen3.5-35B-A3B | Alibaba | Open | 28.96% | |
| 54 | GPT-4.1 | OpenAI | Closed | 25.65% | |