context.vn

aa Gpqa Diamond

122 models evaluated

#ModelProviderTypeScore
1Gemini 3.1 ProGoogleClosed94.1%
2GPT-5.5OpenAIClosed93.5%
3Qwen3.7 MaxAlibabaClosed92.3%
4Gemini 3.5 FlashGoogleClosed92.2%
5GPT-5.4OpenAIClosed92.0%
6GPT-5.3 CodexOpenAIClosed91.5%
7Claude Opus 4.7 (Adaptive)AnthropicClosed91.4%
8Kimi K2.6Moonshot AIOpen91.1%
9Gemini 3 ProGoogleClosed90.8%
10DeepSeek V4 Pro (High)DeepSeekOpen90.5%
11GPT-5.2OpenAIClosed90.3%
12Grok 4.3xAIClosed90.1%
13GPT-5.2-CodexOpenAIClosed89.9%
14Claude Opus 4.6 (Adaptive)AnthropicClosed89.6%
15DeepSeek V4 Flash (Max)DeepSeekOpen89.4%
16Qwen3.5 397B (Reasoning)AlibabaOpen89.3%
17DeepSeek V4 Pro (Max)DeepSeekOpen88.8%
18Qwen 3.6 Max (preview)AlibabaClosed88.8%
19Claude Opus 4.7AnthropicClosed88.5%
20Muse SparkMetaClosed88.4%
21Qwen3.6 PlusAlibabaClosed88.2%
22Kimi K2.5 (Reasoning)Moonshot AIClosed87.9%
23Kimi K2.5Moonshot AIOpen87.9%
24Grok 4xAIClosed87.7%
25GPT-5.4 miniOpenAIClosed87.5%
26MiniMax M2.7MiniMaxOpen87.4%
27GPT-5.1OpenAIClosed87.3%
28MiMo-V2-ProXiaomiClosed87.0%
29GLM-5.1Z.AIOpen86.8%
30DeepSeek V4 Flash (High)DeepSeekOpen86.7%
31Hy3 PreviewTencentOpen86.7%
32MiMo-V2.5-ProXiaomiClosed86.6%
33Claude Opus 4.5 ThinkingAnthropicClosed86.6%
34Qwen3.5 397BAlibabaOpen86.1%
35GPT-5.1-Codex-MaxOpenAIClosed86.0%
36GPT-5.1-CodexOpenAIClosed86.0%
37GLM-4.7Z.AIOpen85.9%
38Qwen3.5-27BAlibabaOpen85.8%
39Qwen3.5-122B-A10BAlibabaOpen85.7%
40Gemma 4 31BGoogleOpen85.7%
41GPT-5 (high)OpenAIClosed85.4%
42Grok 4.1 Fast (Reasoning)xAIClosed85.3%
43GLM-5-TurboZ.AIClosed84.7%
44Grok 4 Fast (Reasoning)xAIClosed84.7%
45o3-proOpenAIClosed84.5%
46Qwen3.5-35B-A3BAlibabaOpen84.5%
47Gemini 2.5 ProGoogleClosed84.4%
48Qwen3.6-27BAlibabaOpen84.2%
49GPT-5 (medium)OpenAIClosed84.2%
50Qwen3.6-35B-A3BAlibabaOpen84.1%
51Claude Opus 4.6AnthropicClosed84.0%
52MiMo-V2-OmniXiaomiClosed82.8%
53o3OpenAIClosed82.7%
54Gemini 3.1 Flash-LiteGoogleClosed82.2%
55GLM-5Z.AIOpen82.0%
56GPT-5.4 nanoOpenAIClosed81.7%
57DeepSeek-R1DeepSeekOpen81.3%
58Gemini 3 FlashGoogleClosed81.2%
59Claude Opus 4.5AnthropicClosed81.0%
60Claude 4.1 Opus ThinkingAnthropicClosed80.9%
61GLM-5V-TurboZ.AIClosed80.9%
62Claude Sonnet 4.6AnthropicClosed79.9%
63Gemma 4 26B A4BGoogleOpen79.2%
64K-ExaoneLG AI ResearchClosed78.3%
65GPT-OSS 120BOpenAIOpen78.2%
66DeepSeek V3.1 (Reasoning)DeepSeekOpen77.9%
67Mistral Small 4 (Reasoning)MistralOpen76.9%
68Mistral Small 4MistralOpen76.9%
69Kimi K2Moonshot AIClosed76.6%
70Qwen3 MaxAlibabaClosed76.4%
71Command A+CohereOpen76.1%
72Trinity-Large-PreviewArcee AIOpen75.2%
73Trinity-Large-ThinkingArcee AIOpen75.2%
74DeepSeek V3.2DeepSeekOpen75.1%
75o3-miniOpenAIClosed74.8%
76Mistral Medium 3.5 128BMistralOpen74.8%
77o1OpenAIClosed74.7%
78Sarvam 105BSarvamOpen73.8%
79DeepSeek V3.1DeepSeekOpen73.5%
80GLM-4.5-AirZ.AIClosed73.3%
81Nemotron Ultra 253BNVIDIAOpen72.8%
82Grok Code Fast 1xAIClosed72.7%
83GPT-OSS 20BOpenAIOpen68.8%
84Claude 4 SonnetAnthropicClosed68.3%
85Gemini 2.5 FlashGoogleClosed68.3%
86Mistral Large 3MistralClosed68.0%
87Llama 4 MaverickMetaOpen67.1%
88GPT-4.1OpenAIClosed66.6%
89GPT-4.1 miniOpenAIClosed66.4%
90MiMo-V2-FlashXiaomiOpen65.6%
91Grok 4.1 FastxAIClosed63.7%
92Sarvam 30BSarvamOpen63.3%
93GLM-4.6Z.AIOpen63.2%
94Exaone 4.0 32BLG AI ResearchOpen62.8%
95DeepSeek R1 Distill Qwen 32BDeepSeekOpen61.5%
96Ling 2.6 FlashInclusionAIOpen59.3%
97Gemini 1.5 ProGoogleClosed58.9%
98Llama 4 ScoutMetaOpen58.7%
99Mistral Medium 3MistralClosed57.8%
100Gemma 4 E4BGoogleOpen57.6%
101Phi-4MicrosoftOpen57.5%
102Solar Pro 2UpstageClosed56.1%
103DeepSeek V3DeepSeekOpen55.7%
104GPT-4oOpenAIClosed54.3%
105Llama 3.1 405BMetaOpen51.5%
106GPT-4.1 nanoOpenAIClosed51.2%
107Nova ProAmazonClosed49.9%
108Claude 3 OpusAnthropicClosed48.9%
109Mistral Large 2MistralClosed48.6%
110Nemotron 3 Nano Omni 30B A3BNVIDIAOpen46.9%
111Gemma 4 E2BGoogleOpen43.3%
112Gemma 3 27BGoogleOpen42.8%
113GPT-4o miniOpenAIClosed42.6%
114Exaone 4.0 1.2BLG AI ResearchOpen42.4%
115Qwen2.5 Coder 32B InstructAlibabaOpen41.7%
116Nemotron 3 Nano 30BNVIDIAOpen39.9%
117Claude 3 HaikuAnthropicClosed37.4%
118Granite-4.0-1BIBMOpen28.1%
119Gemini 1.0 ProGoogleClosed27.7%
120Granite-4.0-H-1BIBMOpen26.3%
121Granite-4.0-350MIBMOpen26.1%
122Granite-4.0-H-350MIBMOpen25.7%