swe Multilingual
20 models evaluated
|
| 1 | Claude Opus 4.8 | Anthropic | Closed | 84.4% | |
| 2 | Composer 2.5 | Cursor | Closed | 79.8% | |
| 3 | Qwen3.7 Max | Alibaba | Closed | 78.3% | |
| 4 | Claude Opus 4.5 | Anthropic | Closed | 77.5% | |
| 5 | Kimi K2.6 | Moonshot AI | Open | 76.7% | |
| 6 | MiniMax M2.7 | MiniMax | Open | 76.5% | |
| 7 | DeepSeek V4 Pro (Max) | DeepSeek | Open | 76.2% | |
| 8 | DeepSeek V4 Pro (High) | DeepSeek | Open | 74.1% | |
| 9 | Qwen3.6 Plus | Alibaba | Closed | 73.8% | |
| 10 | Composer 2 | Cursor | Closed | 73.7% | |
| 11 | DeepSeek V4 Flash (Max) | DeepSeek | Open | 73.3% | |
| 12 | GLM-5 | Z.AI | Open | 73.3% | |
| 13 | Kimi K2.5 | Moonshot AI | Open | 73% | |
| 14 | Qwen3.6-27B | Alibaba | Open | 71.3% | |
| 15 | DeepSeek V4 Flash (High) | DeepSeek | Open | 70.2% | |
| 16 | DeepSeek V4 Pro | DeepSeek | Open | 69.8% | |
| 17 | DeepSeek V4 Flash | DeepSeek | Open | 69.7% | |
| 18 | Qwen3.6-35B-A3B | Alibaba | Open | 67.2% | |
| 19 | Laguna M.1 | Poolside | Closed | 63.1% | |
| 20 | Laguna XS.2 | Poolside | Open | 57.7% | |