| # | Model | Provider | Type | Score | |
|---|---|---|---|---|---|
| 1 | Claude Opus 4.8 | Anthropic | Closed | 93.1% | |
| 2 | Step 3.7 Flash | StepFun | Open | 92.8% | |
| 3 | Kimi K2.6 | Moonshot AI | Open | 92.5% | |
| 4 | Kimi K2.5 | Moonshot AI | Open | 77.1% | |
| 5 | Muse Spark | Meta | Closed | 74.8% | |
| 6 | Claude Opus 4.6 | Anthropic | Closed | 73.7% | |
| 7 | GPT-5.4 | OpenAI | Closed | 73.6% | |
| 8 | Gemini 3.1 Pro | Closed | 69.7% | ||
| 9 | Grok 4.20 | xAI | Closed | 62.8% |