ifeval
19 models evaluated
|
| 1 | Qwen3.5-27B | Alibaba | Open | 95% | |
| 2 | Qwen3.7 Max | Alibaba | Closed | 94.3% | |
| 3 | Qwen3.6 Plus | Alibaba | Closed | 94.3% | |
| 4 | Kimi K2.5 | Moonshot AI | Open | 93.9% | |
| 5 | o3-mini | OpenAI | Closed | 93.9% | |
| 6 | Qwen3.5-122B-A10B | Alibaba | Open | 93.4% | |
| 7 | GLM-5 | Z.AI | Open | 92.6% | |
| 8 | Qwen3.5 397B | Alibaba | Open | 92.6% | |
| 9 | o1 | OpenAI | Closed | 92.2% | |
| 10 | Qwen3.5-35B-A3B | Alibaba | Open | 91.9% | |
| 11 | LFM2.5-8B-A1B | LiquidAI | Open | 91.8% | |
| 12 | Claude Opus 4.5 | Anthropic | Closed | 90.9% | |
| 13 | GPT-4.1 mini | OpenAI | Closed | 88.5% | |
| 14 | GPT-4.1 | OpenAI | Closed | 87.4% | |
| 15 | DeepSeek V3 | DeepSeek | Open | 86.1% | |
| 16 | ZAYA1-8B | Zyphra | Open | 85.6% | |
| 17 | GPT-4.1 nano | OpenAI | Closed | 83.2% | |
| 18 | MiniCPM5-1B | OpenBMB | Open | 80.4% | |
| 19 | LFM2.5-VL-450M | LiquidAI | Open | 61.2% | |