context.vn

BenchLM Benchmarks

165 benchmarks · 22 model scores · Data from Jun 2, 2026

Multilingual6 benchmarks

mgsm

2 models

1DeepSeek V4 Flash BaseDeepSeek85.7%
2DeepSeek V4 Pro BaseDeepSeek84.4%
mmlu Pro X

10 models

1Qwen3.7 MaxAlibaba87%
2Claude Opus 4.5Anthropic85.7%
3Qwen3.6 PlusAlibaba84.7%
4Qwen3.5 397BAlibaba84.7%
5GLM-5Z.AI83.1%
+5 more
nova63

6 models

1Qwen3.5 397BAlibaba59.1%
2Qwen3.7 MaxAlibaba59.0%
3Qwen3.6 PlusAlibaba57.9%
4Claude Opus 4.5Anthropic56.7%
5Kimi K2.5Moonshot AI56.0%
+1 more
include

2 models

1Claude Opus 4.8Anthropic87.6%
2Qwen3.7 MaxAlibaba86.2%
poly Math

1 models

1Qwen3.7 MaxAlibaba86.5%
maxife

1 models

1Qwen3.7 MaxAlibaba89.2%