inference Bench
14 models evaluated
|
| 1 | Claude Sonnet 4.6 | Anthropic | Closed | 8.08x | |
| 2 | GLM-5 | Z.AI | Open | 6.20x | |
| 3 | Gemini 3.1 Pro | Google | Closed | 6.16x | |
| 4 | GPT-5.3 Codex (High) | OpenAI | Closed | 5.48x | |
| 5 | GPT-5.4 (High) | OpenAI | Closed | 5.08x | |
| 7 | GPT-5.5 (High) | OpenAI | Closed | 4.22x | |
| 8 | Claude Opus 4.6 | Anthropic | Closed | 3.89x | |
| 9 | GPT-5.2 | OpenAI | Closed | 3.82x | |
| 10 | GPT-5.1 Codex Max | OpenAI | Closed | 3.54x | |
| 11 | Claude Opus 4.5 | Anthropic | Closed | 3.37x | |
| 12 | Claude Sonnet 4.5 | Anthropic | Closed | 2.96x | |
| 13 | Claude Opus 4.7 | Anthropic | Closed | 2.25x | |
| 14 | GPT-5.2 Codex | OpenAI | Closed | 1.55x | |
| 15 | Claude Haiku 4.5 | Anthropic | Closed | 1.24x | |