BenchLM Benchmarks
165 benchmarks · 147 model scores · Data from Jun 2, 2026
Instruction Following4 benchmarks
ifeval
19 models
1Qwen3.5-27BAlibaba95%
2Qwen3.7 MaxAlibaba94.3%
3Qwen3.6 PlusAlibaba94.3%
4Kimi K2.5Moonshot AI93.9%
5o3-miniOpenAI93.9%
+14 moreif Bench
11 models
1Grok 4.3xAI81.3%
2Qwen3.7 MaxAlibaba79.1%
3Gemini 3.5 FlashGoogle76.3%
4Qwen3.6 PlusAlibaba75.8%
5Nemotron 3 Nano Omni 30B A3BNVIDIA74.2%
+6 moreaa If Bench
116 models
1Grok 4.3xAI81.3%
2Qwen3.7 MaxAlibaba80.5%
3MiMo-V2.5-ProXiaomi79.9%
4DeepSeek V4 Flash (Max)DeepSeek79.2%
5Qwen3.5 397B (Reasoning)Alibaba78.8%
+111 more