context.vn

aa If Bench

116 models evaluated

#ModelProviderTypeScore
1Grok 4.3xAIClosed81.3%
2Qwen3.7 MaxAlibabaClosed80.5%
3MiMo-V2.5-ProXiaomiClosed79.9%
4DeepSeek V4 Flash (Max)DeepSeekOpen79.2%
5Qwen3.5 397B (Reasoning)AlibabaOpen78.8%
6GPT-5.2-CodexOpenAIClosed77.6%
7Gemini 3.1 Flash-LiteGoogleClosed77.2%
8Gemini 3.1 ProGoogleClosed77.1%
9Qwen 3.6 Max (preview)AlibabaClosed76.6%
10DeepSeek V4 Pro (Max)DeepSeekOpen76.5%
11Gemini 3.5 FlashGoogleClosed76.3%
12GLM-5.1Z.AIOpen76.3%
13Kimi K2.6Moonshot AIOpen76.0%
14GPT-5.5OpenAIClosed75.9%
15Muse SparkMetaClosed75.9%
16GPT-5.4 nanoOpenAIClosed75.9%
17Qwen3.5-122B-A10BAlibabaOpen75.7%
18MiniMax M2.7MiniMaxOpen75.7%
19Qwen3.5-27BAlibabaOpen75.6%
20Gemma 4 31BGoogleOpen75.6%
21GPT-5.3 CodexOpenAIClosed75.4%
22GPT-5.2OpenAIClosed75.4%
23Qwen3.6 PlusAlibabaClosed75.2%
24GPT-5.4OpenAIClosed73.9%
25Command A+CohereOpen73.9%
26DeepSeek V4 Flash (High)DeepSeekOpen73.5%
27GPT-5.4 miniOpenAIClosed73.3%
28GLM-5-TurboZ.AIClosed73.2%
29GPT-5 (high)OpenAIClosed73.1%
30GPT-5.1OpenAIClosed72.9%
31Qwen3.5-35B-A3BAlibabaOpen72.5%
32Gemma 4 26B A4BGoogleOpen72.4%
33GLM-5Z.AIOpen72.3%
34o3OpenAIClosed71.4%
35DeepSeek V4 Pro (High)DeepSeekOpen71.3%
36GPT-5 (medium)OpenAIClosed70.6%
37Gemini 3 ProGoogleClosed70.4%
38o1OpenAIClosed70.3%
39Kimi K2.5 (Reasoning)Moonshot AIClosed70.2%
40Kimi K2.5Moonshot AIOpen70.2%
41GPT-5.1-Codex-MaxOpenAIClosed70.0%
42GPT-5.1-CodexOpenAIClosed70.0%
43GPT-OSS 120BOpenAIOpen69.0%
44MiMo-V2-ProXiaomiClosed68.8%
45Mistral Medium 3.5 128BMistralOpen68.8%
46GLM-4.7Z.AIOpen67.9%
47Qwen3.6-27BAlibabaOpen67.6%
48GPT-OSS 20BOpenAIOpen65.1%
49K-ExaoneLG AI ResearchClosed64.7%
50Qwen3.6-35B-A3BAlibabaOpen64.4%
51Nemotron 3 Nano Omni 30B A3BNVIDIAOpen63.2%
52Hy3 PreviewTencentOpen63.1%
53GLM-5V-TurboZ.AIClosed61.1%
54Claude Opus 4.7 (Adaptive)AnthropicClosed58.6%
55Claude Opus 4.5 ThinkingAnthropicClosed58.0%
56Ling 2.6 FlashInclusionAIOpen57.4%
57Trinity-Large-PreviewArcee AIOpen56.3%
58Trinity-Large-ThinkingArcee AIOpen56.3%
59Claude 4.1 Opus ThinkingAnthropicClosed55.4%
60Gemini 3 FlashGoogleClosed55.1%
61Grok 4xAIClosed53.7%
62MiMo-V2-OmniXiaomiClosed53.5%
63Claude Opus 4.6 (Adaptive)AnthropicClosed53.1%
64Grok 4.1 Fast (Reasoning)xAIClosed52.7%
65Qwen3.5 397BAlibabaOpen51.6%
66Grok 4 Fast (Reasoning)xAIClosed50.5%
67DeepSeek V3.2DeepSeekOpen49.0%
68Gemini 2.5 ProGoogleClosed48.7%
69Mistral Small 4 (Reasoning)MistralOpen48.2%
70Mistral Small 4MistralOpen48.2%
71Claude 4 SonnetAnthropicClosed45.4%
72Claude Opus 4.6AnthropicClosed44.6%
73Gemma 4 E4BGoogleOpen44.2%
74Qwen3 MaxAlibabaClosed44.1%
75Claude Opus 4.7AnthropicClosed43.6%
76Claude Opus 4.5AnthropicClosed43.0%
77GPT-4.1OpenAIClosed43.0%
78Llama 4 MaverickMetaOpen43.0%
79Kimi K2Moonshot AIClosed41.5%
80DeepSeek V3.1 (Reasoning)DeepSeekOpen41.5%
81Grok Code Fast 1xAIClosed41.4%
82Claude Sonnet 4.6AnthropicClosed41.2%
83MiMo-V2-FlashXiaomiOpen39.9%
84DeepSeek-R1DeepSeekOpen39.6%
85Llama 4 ScoutMetaOpen39.5%
86Mistral Medium 3MistralClosed39.3%
87Llama 3.1 405BMetaOpen39.0%
88Gemini 2.5 FlashGoogleClosed39.0%
89GPT-4.1 miniOpenAIClosed38.3%
90Nemotron Ultra 253BNVIDIAOpen38.2%
91Nova ProAmazonClosed38.1%
92Gemma 4 E2BGoogleOpen38.0%
93DeepSeek V3.1DeepSeekOpen37.8%
94GLM-4.5-AirZ.AIClosed37.6%
95Nemotron 3 Nano 30BNVIDIAOpen37.5%
96GLM-4.6Z.AIOpen36.7%
97Grok 4.1 FastxAIClosed36.5%
98Mistral Large 3MistralClosed36.2%
99Claude 3 HaikuAnthropicClosed36.1%
100DeepSeek V3DeepSeekOpen34.8%
101Sarvam 105BSarvamOpen34.4%
102GPT-4oOpenAIClosed34.3%
103Solar Pro 2UpstageClosed33.7%
104Exaone 4.0 32BLG AI ResearchOpen33.5%
105GPT-4.1 nanoOpenAIClosed32.0%
106Gemma 3 27BGoogleOpen31.8%
107Mistral Large 2MistralClosed31.2%
108GPT-4o miniOpenAIClosed31.0%
109Sarvam 30BSarvamOpen26.5%
110Granite-4.0-H-1BIBMOpen26.2%
111Exaone 4.0 1.2BLG AI ResearchOpen25.3%
112Phi-4MicrosoftOpen23.5%
113DeepSeek R1 Distill Qwen 32BDeepSeekOpen22.9%
114Granite-4.0-1BIBMOpen20.5%
115Granite-4.0-H-350MIBMOpen17.6%
116Granite-4.0-350MIBMOpen15.9%