context.vn

gdpval Aa Normalized

114 models evaluated

#ModelProviderTypeScore
1GPT-5.5OpenAIClosed63.5%
2Claude Opus 4.7 (Adaptive)AnthropicClosed62.6%
3Claude Opus 4.7AnthropicClosed59.0%
4GPT-5.4OpenAIClosed58.7%
5Gemini 3.5 FlashGoogleClosed57.8%
6Claude Opus 4.6 (Adaptive)AnthropicClosed55.9%
7Claude Sonnet 4.6AnthropicClosed54.8%
8Claude Opus 4.6AnthropicClosed54.4%
9MiMo-V2.5-ProXiaomiClosed53.6%
10DeepSeek V4 Pro (High)DeepSeekOpen52.9%
11DeepSeek V4 Pro (Max)DeepSeekOpen52.7%
12Qwen3.7 MaxAlibabaClosed52.3%
13GLM-5.1Z.AIOpen51.8%
14MiniMax M2.7MiniMaxOpen50.2%
15Qwen 3.6 Max (preview)AlibabaClosed50.2%
16Grok 4.3xAIClosed49.8%
17GLM-5-TurboZ.AIClosed49.8%
18Kimi K2.6Moonshot AIOpen49.1%
19GPT-5.3 CodexOpenAIClosed48.9%
20GPT-5.2OpenAIClosed48.4%
21Claude Opus 4.5 ThinkingAnthropicClosed47.5%
22GPT-5.4 miniOpenAIClosed46.9%
23Claude Opus 4.5AnthropicClosed46.0%
24Muse SparkMetaClosed45.9%
25Step 3.7 FlashStepFunOpen45.8%
26DeepSeek V4 Flash (High)DeepSeekOpen45.7%
27MiMo-V2-ProXiaomiClosed45.4%
28Qwen3.6-27BAlibabaOpen45.2%
29GLM-5Z.AIOpen44.6%
30DeepSeek V4 Flash (Max)DeepSeekOpen44.4%
31Qwen3.6 PlusAlibabaClosed42.6%
32GLM-5V-TurboZ.AIClosed41.6%
33Gemini 3 Pro Deep ThinkGoogleClosed41.2%
34MiMo-V2-OmniXiaomiClosed41.0%
35Gemini 3.1 ProGoogleClosed40.7%
36Qwen3.6-35B-A3BAlibabaOpen39.9%
37GPT-5 (high)OpenAIClosed39.8%
38GPT-5.2-CodexOpenAIClosed39.5%
39Kimi K2.5 (Reasoning)Moonshot AIClosed39.3%
40Kimi K2.5Moonshot AIOpen39.3%
41Hy3 PreviewTencentOpen36.9%
42GPT-5.1OpenAIClosed36.3%
43Qwen3.5 397BAlibabaOpen36.1%
44GPT-5.1-Codex-MaxOpenAIClosed34.6%
45GPT-5.4 nanoOpenAIClosed34.6%
46GPT-5.1-CodexOpenAIClosed34.6%
47Qwen3.5 397B (Reasoning)AlibabaOpen34.5%
48Gemini 3 ProGoogleClosed34.3%
49GLM-4.7Z.AIOpen34.3%
50Mistral Medium 3.5 128BMistralOpen33.4%
51Qwen3.5-27BAlibabaOpen33.0%
52Claude 4 SonnetAnthropicClosed31.4%
53Qwen3.5-122B-A10BAlibabaOpen30.8%
54Gemini 3 FlashGoogleClosed30.8%
55Gemma 4 31BGoogleOpen30.7%
56DeepSeek V3.1DeepSeekOpen28.9%
57MiMo-V2-FlashXiaomiOpen28.1%
58Grok 4.1 Fast (Reasoning)xAIClosed27.2%
59Qwen3 MaxAlibabaClosed27.0%
60Gemma 4 26B A4BGoogleOpen25.7%
61Grok 4 Fast (Reasoning)xAIClosed25.6%
62GPT-5 (medium)OpenAIClosed25.0%
63Grok 4xAIClosed24.4%
64GLM-4.6Z.AIOpen24.3%
65GPT-OSS 120BOpenAIOpen22.4%
66Gemini 3.1 Flash-LiteGoogleClosed21.3%
67Command A+CohereOpen21.0%
68Gemini 2.5 ProGoogleClosed20.8%
69Qwen3.5-35B-A3BAlibabaOpen20.4%
70DeepSeek V3.2DeepSeekOpen18.8%
71Trinity-Large-PreviewArcee AIOpen18.3%
72Trinity-Large-ThinkingArcee AIOpen18.3%
73Mistral Large 3MistralClosed18.1%
74Mistral Small 4 (Reasoning)MistralOpen18.1%
75Mistral Small 4MistralOpen18.1%
76K-ExaoneLG AI ResearchClosed16.3%
77Grok 4.1 FastxAIClosed14.2%
78Ling 2.6 FlashInclusionAIOpen14.1%
79GPT-4.1OpenAIClosed13.8%
80Grok Code Fast 1xAIClosed13.2%
81Nemotron 3 Nano Omni 30B A3BNVIDIAOpen13.2%
82o3OpenAIClosed12.7%
83Gemini 2.5 FlashGoogleClosed12.1%
84Sarvam 105BSarvamOpen11.9%
85o1OpenAIClosed11.8%
86DeepSeek-R1DeepSeekOpen9.0%
87GPT-OSS 20BOpenAIOpen7.5%
88GPT-4.1 miniOpenAIClosed6.0%
89DeepSeek V3.1 (Reasoning)DeepSeekOpen5.6%
90Mistral Medium 3MistralClosed4.3%
91GLM-4.5-AirZ.AIClosed3.0%
92Kimi K2Moonshot AIClosed1.3%
93GPT-4oOpenAIClosed0.0%
94Llama 3.1 405BMetaOpen0.0%
95Mistral Large 2MistralClosed0.0%
96DeepSeek V3DeepSeekOpen0.0%
97GPT-4.1 nanoOpenAIClosed0.0%
98Nemotron 3 Nano 30BNVIDIAOpen0.0%
99Claude 3 HaikuAnthropicClosed0.0%
100Llama 4 ScoutMetaOpen0.0%
101Nemotron Ultra 253BNVIDIAOpen0.0%
102Gemma 3 27BGoogleOpen0.0%
103Llama 4 MaverickMetaOpen0.0%
104Nova ProAmazonClosed0.0%
105Exaone 4.0 32BLG AI ResearchOpen0.0%
106Sarvam 30BSarvamOpen0.0%
107Gemma 4 E4BGoogleOpen0.0%
108Gemma 4 E2BGoogleOpen0.0%
109Granite-4.0-1BIBMOpen0.0%
110Granite-4.0-H-1BIBMOpen0.0%
111Solar Pro 2UpstageClosed0.0%
112Exaone 4.0 1.2BLG AI ResearchOpen0.0%
113Granite-4.0-350MIBMOpen0.0%
114Granite-4.0-H-350MIBMOpen0.0%