BenchLM Benchmarks

207 benchmarks · 182 model scores · Data from Jul 25, 2026

Instruction Following4 benchmarks

21 models

4Qwen3.7 MaxAlibaba · Closed94.3%

5Qwen3.6 PlusAlibaba · Closed94.3%

6Kimi K2.5Moonshot AI · Open weight93.9%

7o3-miniOpenAI · Closed93.9%

8Qwen3.5-122B-A10BAlibaba · Open weight93.4%

15 models

1MAI-Thinking-1Microsoftmai-thinking-1

3Grok 4.3xAIgrok-4-3

4InklingThinking Machines Lab · Open weight

5Qwen3.7 MaxAlibaba · Closed

6Qwen3.7 PlusAlibaba · Closed

145 models

1MiniMax M3MiniMaxminimax-m3

3Grok 4.3xAIgrok-4-3

4Qwen3.7 MaxAlibaba · Closed

5MiMo-V2.5-ProXiaomi · Closed

6DeepSeek V4 Flash (Max)DeepSeek · Open weight

1 models

1Interfaze BetaInterfazeinterfaze-beta