context.vn

hle

36 models evaluated

#ModelProviderTypeScore
1Claude Mythos PreviewAnthropicClosed64.7%
2GPT-5.4 ProOpenAIClosed58.7%
3Claude Opus 4.8AnthropicClosed57.9%
4GPT-5.5 ProOpenAIClosed57.2%
5Claude Opus 4.7 (Adaptive)AnthropicClosed54.7%
6Claude Opus 4.6AnthropicClosed53%
7GLM-5.1Z.AIOpen52.3%
8GPT-5.5OpenAIClosed52.2%
9GPT-5.4OpenAIClosed52.1%
10GLM-5Z.AIOpen50.4%
11Muse SparkMetaClosed50.4%
12Claude Sonnet 4.6AnthropicClosed49%
13MiMo-V2.5-ProXiaomiClosed48%
14GPT-5.4 miniOpenAIClosed41.5%
15Qwen3.7 MaxAlibabaClosed41.4%
16Gemini 3.5 FlashGoogleClosed40.2%
17DeepSeek V4 Pro (Max)DeepSeekOpen37.7%
18GPT-5.4 nanoOpenAIClosed37.7%
19Grok 4.3xAIClosed35%
20DeepSeek V4 Flash (Max)DeepSeekOpen34.8%
21Kimi K2.6Moonshot AIOpen34.7%
22DeepSeek V4 Pro (High)DeepSeekOpen34.5%
23Claude Opus 4.5AnthropicClosed30.8%
24Kimi K2.5Moonshot AIOpen30.1%
25DeepSeek V4 Flash (High)DeepSeekOpen29.4%
26Qwen3.6 PlusAlibabaClosed28.8%
27Qwen3.5 397BAlibabaOpen28.7%
28Gemma 4 31BGoogleOpen26.5%
29Hy3 PreviewTencentOpen25.5%
30GLM-4.7Z.AIOpen24.8%
31Qwen3.6-27BAlibabaOpen24%
32Qwen3.6-35B-A3BAlibabaOpen21.4%
33Gemini 2.5 ProGoogleClosed18.8%
34Gemma 4 26B A4BGoogleOpen17.2%
35DeepSeek V4 FlashDeepSeekOpen8.1%
36DeepSeek V4 ProDeepSeekOpen7.7%