context.vn

super Gpqa

18 models evaluated

#ModelProviderTypeScore
1Claude Opus 4.6AnthropicClosed95%
2Claude Sonnet 4.6AnthropicClosed95%
3Qwen 3.6 Max (preview)AlibabaClosed73.9%
4Qwen3.7 MaxAlibabaClosed73.6%
5Qwen3.6 PlusAlibabaClosed71.6%
6Claude Opus 4.5AnthropicClosed70.6%
7Qwen3.5 397BAlibabaOpen70.4%
8Kimi K2.5Moonshot AIOpen69.2%
9Qwen3.5-122B-A10BAlibabaOpen67.1%
10GLM-5Z.AIOpen66.8%
11Qwen3.6-27BAlibabaOpen66%
12Qwen3.5-27BAlibabaOpen65.6%
13Qwen3.6-35B-A3BAlibabaOpen64.7%
14Qwen3.5-35B-A3BAlibabaOpen63.4%
15Qwen3 235B 2507AlibabaOpen62.6%
16DeepSeek V4 Pro BaseDeepSeekOpen53.9%
17DeepSeek V4 Flash BaseDeepSeekOpen46.5%
18MiniCPM5-1BOpenBMBOpen23.1%