context.vn

terminal Bench Hard

115 models evaluated

#ModelProviderTypeScore
1GPT-5.5OpenAIClosed60.6%
2GPT-5.4OpenAIClosed57.6%
3Claude Opus 4.7AnthropicClosed54.5%
4Gemini 3.1 ProGoogleClosed53.8%
5GPT-5.3 CodexOpenAIClosed53.0%
6GPT-5.4 miniOpenAIClosed52.3%
7Claude Opus 4.7 (Adaptive)AnthropicClosed51.5%
8Qwen3.7 MaxAlibabaClosed50.8%
9Claude Opus 4.6AnthropicClosed48.5%
10GPT-5.2OpenAIClosed47.0%
11Claude Opus 4.5 ThinkingAnthropicClosed47.0%
12DeepSeek V4 Pro (Max)DeepSeekOpen46.2%
13Claude Sonnet 4.6AnthropicClosed46.2%
14Claude Opus 4.6 (Adaptive)AnthropicClosed46.2%
15GPT-5.1OpenAIClosed45.5%
16Muse SparkMetaClosed45.5%
17Kimi K2.6Moonshot AIOpen43.9%
18Qwen3.6 PlusAlibabaClosed43.9%
19Qwen 3.6 Max (preview)AlibabaClosed43.9%
20GLM-5.1Z.AIOpen43.2%
21GLM-5Z.AIOpen43.2%
22MiMo-V2.5-ProXiaomiClosed43.2%
23GPT-5.4 nanoOpenAIClosed42.4%
24DeepSeek V4 Pro (High)DeepSeekOpen41.7%
25Gemini 3 ProGoogleClosed41.7%
26Gemini 3.5 FlashGoogleClosed40.9%
27Qwen3.5 397B (Reasoning)AlibabaOpen40.9%
28Claude Opus 4.5AnthropicClosed40.9%
29MiMo-V2-ProXiaomiClosed40.9%
30MiniMax M2.7MiniMaxOpen39.4%
31DeepSeek V4 Flash (High)DeepSeekOpen38.6%
32GPT-5 (medium)OpenAIClosed37.9%
33Grok 4xAIClosed37.9%
34Grok 4.3xAIClosed37.9%
35GPT-5.2-CodexOpenAIClosed37.1%
36o3OpenAIClosed37.1%
37Gemma 4 31BGoogleOpen36.4%
38DeepSeek V4 Flash (Max)DeepSeekOpen35.6%
39Qwen3.5 397BAlibabaOpen35.6%
40Kimi K2.5 (Reasoning)Moonshot AIClosed34.8%
41GPT-5.1-Codex-MaxOpenAIClosed34.8%
42Qwen3.6-27BAlibabaOpen34.8%
43Qwen3.6-35B-A3BAlibabaOpen34.8%
44Kimi K2.5Moonshot AIOpen34.8%
45MiMo-V2-OmniXiaomiClosed34.8%
46GPT-5.1-CodexOpenAIClosed34.8%
47Claude 4.1 Opus ThinkingAnthropicClosed34.3%
48Hy3 PreviewTencentOpen34.1%
49Mistral Medium 3.5 128BMistralOpen33.3%
50GLM-5-TurboZ.AIClosed33.3%
51GPT-5 (high)OpenAIClosed32.6%
52Qwen3.5-27BAlibabaOpen32.6%
53DeepSeek V3.2DeepSeekOpen32.6%
54GLM-5V-TurboZ.AIClosed32.6%
55GLM-4.7Z.AIOpen31.8%
56Gemini 3 FlashGoogleClosed31.8%
57Qwen3.5-122B-A10BAlibabaOpen31.1%
58GLM-4.6Z.AIOpen28.8%
59Claude 4 SonnetAnthropicClosed27.3%
60Gemini 2.5 ProGoogleClosed26.5%
61Qwen3.5-35B-A3BAlibabaOpen26.5%
62MiMo-V2-FlashXiaomiOpen25.8%
63DeepSeek V3.1 (Reasoning)DeepSeekOpen25.0%
64Command A+CohereOpen25.0%
65Gemini 3.1 Flash-LiteGoogleClosed24.2%
66DeepSeek V3.1DeepSeekOpen24.2%
67Grok 4.1 Fast (Reasoning)xAIClosed24.2%
68GPT-OSS 120BOpenAIOpen23.5%
69K-ExaoneLG AI ResearchClosed22.7%
70Trinity-Large-PreviewArcee AIOpen22.7%
71Trinity-Large-ThinkingArcee AIOpen22.7%
72Ling 2.6 FlashInclusionAIOpen21.2%
73GLM-4.5-AirZ.AIClosed20.5%
74Qwen3 MaxAlibabaClosed20.5%
75Grok 4 Fast (Reasoning)xAIClosed18.9%
76Grok Code Fast 1xAIClosed17.4%
77Mistral Small 4 (Reasoning)MistralOpen17.4%
78Mistral Small 4MistralOpen17.4%
79Mistral Large 3MistralClosed15.9%
80Kimi K2Moonshot AIClosed15.9%
81DeepSeek-R1DeepSeekOpen15.9%
82Grok 4.1 FastxAIClosed14.4%
83GPT-4.1OpenAIClosed13.6%
84Gemma 4 26B A4BGoogleOpen13.6%
85o1OpenAIClosed12.9%
86Gemini 2.5 FlashGoogleClosed12.1%
87Nemotron 3 Nano 30BNVIDIAOpen12.1%
88GPT-OSS 20BOpenAIOpen10.6%
89GPT-4oOpenAIClosed8.3%
90Nemotron 3 Nano Omni 30B A3BNVIDIAOpen8.3%
91Gemma 4 E4BGoogleOpen8.3%
92GPT-4.1 miniOpenAIClosed7.6%
93o3-miniOpenAIClosed6.8%
94Llama 3.1 405BMetaOpen6.8%
95DeepSeek V3DeepSeekOpen6.8%
96Llama 4 MaverickMetaOpen6.8%
97Mistral Large 2MistralClosed6.1%
98Nova ProAmazonClosed6.1%
99Solar Pro 2UpstageClosed4.5%
100Phi-4MicrosoftOpen3.8%
101GPT-4.1 nanoOpenAIClosed3.8%
102Gemma 3 27BGoogleOpen3.8%
103Mistral Medium 3MistralClosed3.8%
104Gemma 4 E2BGoogleOpen3.0%
105Nemotron Ultra 253BNVIDIAOpen2.3%
106Sarvam 30BSarvamOpen2.3%
107Sarvam 105BSarvamOpen1.5%
108Llama 4 ScoutMetaOpen1.5%
109Exaone 4.0 32BLG AI ResearchOpen1.5%
110Claude 3 HaikuAnthropicClosed0.8%
111Granite-4.0-1BIBMOpen0.0%
112Granite-4.0-H-1BIBMOpen0.0%
113Exaone 4.0 1.2BLG AI ResearchOpen0.0%
114Granite-4.0-350MIBMOpen0.0%
115Granite-4.0-H-350MIBMOpen0.0%