context.vn

mmlu Redux

8 models evaluated

#ModelProviderTypeScore
1Claude Opus 4.5AnthropicClosed96.6%
2Qwen3.7 MaxAlibabaClosed95%
3Qwen3.5 397BAlibabaOpen94.9%
4Qwen3.6 PlusAlibabaClosed94.5%
5Qwen3.6-27BAlibabaOpen93.5%
6DeepSeek V4 Pro BaseDeepSeekOpen90.8%
7DeepSeek V4 Flash BaseDeepSeekOpen89.4%
8MiniCPM5-1BOpenBMBOpen70.1%