context.vn

swe Verified

49 models evaluated

#ModelProviderTypeScore
1Claude Mythos PreviewAnthropicClosed93.9%
2Claude Opus 4.8AnthropicClosed88.6%
3Claude Opus 4.7 (Adaptive)AnthropicClosed87.6%
4GPT-5.3 CodexOpenAIClosed85%
5Claude Opus 4.5AnthropicClosed80.9%
6Claude Opus 4.6AnthropicClosed80.8%
7DeepSeek V4 Pro (Max)DeepSeekOpen80.6%
8MiniMax M3MiniMaxOpen80.5%
9Qwen3.7 MaxAlibabaClosed80.4%
10Kimi K2.6Moonshot AIOpen80.2%
11GPT-5.2OpenAIClosed80%
12Claude Sonnet 4.6AnthropicClosed79.6%
13DeepSeek V4 Pro (High)DeepSeekOpen79.4%
14DeepSeek V4 Flash (Max)DeepSeekOpen79%
15Qwen3.6 PlusAlibabaClosed78.8%
16DeepSeek V4 Flash (High)DeepSeekOpen78.6%
17MiMo-V2-ProXiaomiClosed78%
18GLM-5Z.AIOpen77.8%
19Mistral Medium 3.5 128BMistralOpen77.6%
20Muse SparkMetaClosed77.4%
21Qwen3.6-27BAlibabaOpen77.2%
22Claude Sonnet 4.5AnthropicClosed77.2%
23Kimi K2.5 (Reasoning)Moonshot AIClosed76.8%
24Kimi K2.5Moonshot AIOpen76.8%
25Grok 4.20xAIClosed76.7%
26Qwen3.5 397BAlibabaOpen76.2%
27MiMo-V2-OmniXiaomiClosed74.8%
28Laguna M.1PoolsideClosed74.6%
29Claude 4.1 OpusAnthropicClosed74.5%
30Hy3 PreviewTencentOpen74.4%
31GLM-4.7Z.AIOpen73.8%
32DeepSeek V4 FlashDeepSeekOpen73.7%
33DeepSeek V4 ProDeepSeekOpen73.6%
34Qwen3.6-35B-A3BAlibabaOpen73.4%
35MiMo-V2-FlashXiaomiOpen73.4%
36Claude Haiku 4.5AnthropicClosed73.3%
37Claude 4 SonnetAnthropicClosed72.7%
38Qwen3.5-27BAlibabaOpen72.4%
39Qwen3.5-122B-A10BAlibabaOpen72%
40Grok Code Fast 1xAIClosed70.8%
41Laguna XS.2PoolsideOpen69.9%
42Qwen3.5-35B-A3BAlibabaOpen69.2%
43Gemini 2.5 ProGoogleClosed63.8%
44GPT-4.1OpenAIClosed54.6%
45ZAYA1-74B-PreviewZyphraOpen53.2%
46o3-miniOpenAIClosed49.3%
47Claude 3.5 SonnetAnthropicClosed49%
48DeepSeek V3DeepSeekOpen42%
49GPT-4.1 miniOpenAIClosed23.6%