context.vn

BenchLM Benchmarks

165 benchmarks · 240 model scores · Data from Jun 2, 2026

Multimodal35 benchmarks

mmmu

9 models

1Qwen3.6 PlusAlibaba86.0%
2Qwen3.5-122B-A10BAlibaba83.9%
3Qwen3.6-27BAlibaba82.9%
4Qwen3.5-27BAlibaba82.3%
5Qwen3.6-35B-A3BAlibaba81.7%
+4 more
mmmu Pro

28 models

1GPT-5.4 ProOpenAI94%
2Claude Mythos PreviewAnthropic92.7%
3Gemini 3.1 ProGoogle83.9%
4Gemini 3.5 FlashGoogle83.6%
5GPT-5.5OpenAI81.2%
+23 more
aa Mmmu Pro

68 models

1Gemini 3.5 FlashGoogle84.3%
2Gemini 3.1 ProGoogle82.4%
3Muse SparkMeta80.5%
4Gemini 3 ProGoogle80.2%
5GPT-5.5OpenAI79.9%
+63 more
ocr Bench V2

1 models

1Interfaze BetaInterfaze70.7%
olm Ocr

1 models

1Interfaze BetaInterfaze85.7%
vox Populi Wer

1 models

1Interfaze BetaInterfaze2.4%
design Arena Website

1 models

1Grok 4.3xAI1294
office Qa Pro

5 models

1Claude Opus 4.8Anthropic66.2%
2GPT-5.5OpenAI54.1%
3GPT-5.4OpenAI53.2%
4MiniMax M3MiniMax45.1%
5Claude Opus 4.7 (Adaptive)Anthropic43.6%
mmmu Pro Python

5 models

1GPT-5.5OpenAI83.2%
2GPT-5.4OpenAI82.1%
3Kimi K2.6Moonshot AI80.1%
4GPT-5.4 miniOpenAI78%
5GPT-5.4 nanoOpenAI69.5%
omni Doc Bench15

2 models

1MiniMax M3MiniMax91.6%
2Qwen3.6-35B-A3BAlibaba89.9%
real World Qa

3 models

1Qwen3.6-35B-A3BAlibaba85.3%
2Qwen3.6-27BAlibaba84.1%
3LFM2.5-VL-450MLiquidAI58.4%
video Mme With Sub

4 models

1Qwen3.6-27BAlibaba87.7%
2MiMo-V2.5Xiaomi87.7%
3Qwen3.6-35B-A3BAlibaba86.6%
4MiniMax M3MiniMax85.4%
video Mme No Sub

2 models

1Qwen3.6-35B-A3BAlibaba82.5%
2Nemotron 3 Nano Omni 30B A3BNVIDIA72.2%
video Mme

1 models

1Kimi K2.5Moonshot AI87.4%
math Vision

9 models

1Qwen3.5 397BAlibaba88.6%
2Qwen3.6 PlusAlibaba88.0%
3Kimi K2.6Moonshot AI87.4%
4Gemini 3 ProGoogle86.6%
5Qwen3.5-122B-A10BAlibaba86.2%
+4 more
dyna Math

1 models

1Qwen3.6-27BAlibaba85.6%
m Star

1 models

1Qwen3.6-27BAlibaba81.4%
mm Long Bench Doc

1 models

1Nemotron 3 Nano Omni 30B A3BNVIDIA57.5%
cc Ocr

2 models

1Qwen3.6-35B-A3BAlibaba81.9%
2Qwen3.6-27BAlibaba81.2%
ai2d Test

2 models

1Qwen3.6-35B-A3BAlibaba92.7%
2Nemotron 3 Nano Omni 30B A3BNVIDIA88.5%
count Bench

2 models

1Qwen3.6-27BAlibaba97.8%
2LFM2.5-VL-450MLiquidAI73.3%
refcoco Avg

4 models

1Qwen3.6-27BAlibaba92.5%
2Qwen3.6-35B-A3BAlibaba92.0%
3Nemotron 3 Nano Omni 30B A3BNVIDIA90.5%
4Interfaze BetaInterfaze82.1%
odinw13

1 models

1Qwen3.6-35B-A3BAlibaba50.8%
erqa

6 models

1Gemini 3.1 ProGoogle69.4%
2GPT-5.4OpenAI65.4%
3Muse SparkMeta64.7%
4Qwen3.6-27BAlibaba62.5%
5Grok 4.20xAI54.1%
+1 more
video Mmmu

8 models

1Gemini 3 ProGoogle87.6%
2Kimi K2.5Moonshot AI86.6%
3Qwen3.5 397BAlibaba84.7%
4MiniMax M3MiniMax84.6%
5Claude Opus 4.5Anthropic84.4%
+3 more
mlvu Avg

2 models

1Qwen3.6-27BAlibaba86.6%
2Qwen3.6-35B-A3BAlibaba86.2%
mmvu

4 models

1Kimi K2.5Moonshot AI80.4%
2Qwen3.5-122B-A10BAlibaba74.7%
3Qwen3.5-27BAlibaba73.3%
4Qwen3.5-35B-A3BAlibaba72.3%
screen Spot Pro

14 models

1Claude Opus 4.8Anthropic87.9%
2GPT-5.4OpenAI85.4%
3Gemini 3.1 ProGoogle84.4%
4Muse SparkMeta84.1%
5Claude Opus 4.6Anthropic83.1%
+9 more
med Xpert Qa Mm

5 models

1Gemini 3.1 ProGoogle81.3%
2Muse SparkMeta78.4%
3GPT-5.4OpenAI77.1%
4Grok 4.20xAI65.8%
5Claude Opus 4.6Anthropic64.8%
zero Bench

3 models

1GPT-5.4OpenAI41.0%
2Muse SparkMeta33.0%
3Gemini 3.1 ProGoogle29.0%
simple Vqa

7 models

1Step 3.7 FlashStepFun79.2%
2Gemini 3.1 ProGoogle72.4%
3Muse SparkMeta71.3%
4GPT-5.4OpenAI61.1%
5Qwen3.6-35B-A3BAlibaba58.9%
+2 more
v Star

11 models

1Kimi K2.6Moonshot AI96.9%
2Qwen3.6 PlusAlibaba96.9%
3Qwen3.5 397BAlibaba95.8%
4Step 3.7 FlashStepFun95.3%
5Qwen3.6-27BAlibaba94.7%
+6 more
charxiv

22 models

1Claude Mythos PreviewAnthropic93.2%
2Claude Opus 4.7 (Adaptive)Anthropic91%
3Claude Opus 4.8Anthropic89.9%
4Muse SparkMeta86.4%
5Gemini 3.5 FlashGoogle84.2%
+17 more
charxiv No Tools

3 models

1Claude Mythos PreviewAnthropic86.1%
2Claude Opus 4.7 (Adaptive)Anthropic82.1%
3Claude Opus 4.8Anthropic80.5%
blueprint Bench2

1 models

1Gemini 3.5 FlashGoogle33.6%