context.vn

long Bench V2

10 models evaluated

#ModelProviderTypeScore
1Claude Opus 4.5AnthropicClosed64.4%
2Qwen3.5 397BAlibabaOpen63.2%
3Qwen3.6 PlusAlibabaClosed62%
4Kimi K2.5Moonshot AIOpen61%
5GLM-5Z.AIOpen60.8%
6Qwen3.5-27BAlibabaOpen60.6%
7Qwen3.5-122B-A10BAlibabaOpen60.2%
8Qwen3.5-35B-A3BAlibabaOpen59%
9DeepSeek V4 Pro BaseDeepSeekOpen51.5%
10DeepSeek V4 Flash BaseDeepSeekOpen44.7%