context.vn

gpqa

54 models evaluated

#ModelProviderTypeScore
1Claude Mythos PreviewAnthropicClosed94.5%
2Claude Opus 4.7 (Adaptive)AnthropicClosed94.2%
3Claude Opus 4.8AnthropicClosed93.6%
4GPT-5.5OpenAIClosed93.6%
5GPT-5.4OpenAIClosed92.8%
6Qwen3.7 MaxAlibabaClosed92.4%
7GPT-5.2OpenAIClosed92.4%
8Gemini 3.5 FlashGoogleClosed92.2%
9Claude Opus 4.6AnthropicClosed91.3%
10Kimi K2.6Moonshot AIOpen90.5%
11Qwen3.6 PlusAlibabaClosed90.4%
12DeepSeek V4 Pro (Max)DeepSeekOpen90.1%
13Grok 4.3xAIClosed90.1%
14Claude Sonnet 4.6AnthropicClosed89.9%
15Interfaze BetaInterfazeClosed89.9%
16DeepSeek V4 Pro (High)DeepSeekOpen89.1%
17Qwen3.5 397BAlibabaOpen88.4%
18DeepSeek V4 Flash (Max)DeepSeekOpen88.1%
19GPT-5.4 miniOpenAIClosed88%
20Qwen3.6-27BAlibabaOpen87.8%
21Kimi K2.5 (Reasoning)Moonshot AIClosed87.6%
22Kimi K2.5Moonshot AIOpen87.6%
23DeepSeek V4 Flash (High)DeepSeekOpen87.4%
24Hy3 PreviewTencentOpen87.2%
25Claude Opus 4.5AnthropicClosed87%
26Qwen3.5-122B-A10BAlibabaOpen86.6%
27GLM-5Z.AIOpen86%
28Qwen3.6-35B-A3BAlibabaOpen86%
29GLM-4.7Z.AIOpen85.7%
30Qwen3.5-27BAlibabaOpen85.5%
31Gemma 4 31BGoogleOpen84.3%
32Qwen3.5-35B-A3BAlibabaOpen84.2%
33MiMo-V2-FlashXiaomiOpen83.7%
34Claude Sonnet 4.5AnthropicClosed83.4%
35Gemini 2.5 ProGoogleClosed83%
36GPT-5.4 nanoOpenAIClosed82.8%
37o1-proOpenAIClosed79%
38Qwen3 235B 2507AlibabaOpen77.5%
39o3-miniOpenAIClosed77.2%
40o1OpenAIClosed75.7%
41DeepSeek V4 ProDeepSeekOpen72.9%
42Nemotron 3 Nano Omni 30B A3BNVIDIAOpen72.2%
43DeepSeek V4 FlashDeepSeekOpen71.2%
44ZAYA1-8BZyphraOpen71%
45GPT-4.1OpenAIClosed66.3%
46GPT-4.1 miniOpenAIClosed64.2%
47Claude 3.5 SonnetAnthropicClosed59.4%
48DeepSeek V3DeepSeekOpen59.1%
49Ling 2.6 FlashInclusionAIOpen59%
50Gemma 4 E4BGoogleOpen58.6%
51ZAYA1-74B-PreviewZyphraOpen57.3%
52GPT-4.1 nanoOpenAIClosed50.3%
53Gemma 4 E2BGoogleOpen43.4%
54LFM2.5-VL-450MLiquidAIOpen25.7%