context.vn

gpqa Diamond

29 models evaluated

#ModelProviderTypeScore
1Gemini 3.1 ProGoogleClosed94.3%
2Claude Opus 4.7 (Adaptive)AnthropicClosed94.2%
3Claude Opus 4.8AnthropicClosed93.6%
4GPT-5.5OpenAIClosed93.6%
5GPT-5.4OpenAIClosed92.8%
6Gemini 3.5 FlashGoogleClosed92.7%
7Qwen3.7 MaxAlibabaClosed92.4%
8Kimi K2.6Moonshot AIOpen90.5%
9DeepSeek V4 Pro (Max)DeepSeekOpen90.1%
10Interfaze BetaInterfazeClosed89.9%
11Muse SparkMetaClosed89.5%
12Claude Opus 4.6AnthropicClosed89.2%
13DeepSeek V4 Pro (High)DeepSeekOpen89.1%
14Grok 4.20xAIClosed88.5%
15DeepSeek V4 Flash (Max)DeepSeekOpen88.1%
16Kimi K2.5Moonshot AIOpen87.6%
17DeepSeek V4 Flash (High)DeepSeekOpen87.4%
18Hy3 PreviewTencentOpen87.2%
19MiniMax M2.7MiniMaxOpen87.0%
20GLM-5.1Z.AIOpen86.2%
21GLM-5Z.AIOpen86.0%
22Trinity-Large-ThinkingArcee AIOpen76.3%
23DeepSeek V4 ProDeepSeekOpen72.9%
24Nemotron 3 Nano Omni 30B A3BNVIDIAOpen72.2%
25DeepSeek V4 FlashDeepSeekOpen71.2%
26ZAYA1-8BZyphraOpen71.0%
27Trinity-Large-PreviewArcee AIOpen63.3%
28ZAYA1-74B-PreviewZyphraOpen57.3%
29MiniCPM5-1BOpenBMBOpen26.3%