context.vn

charxiv

22 models evaluated

#ModelProviderTypeScore
1Claude Mythos PreviewAnthropicClosed93.2%
2Claude Opus 4.7 (Adaptive)AnthropicClosed91%
3Claude Opus 4.8AnthropicClosed89.9%
4Muse SparkMetaClosed86.4%
5Gemini 3.5 FlashGoogleClosed84.2%
6GPT-5.4OpenAIClosed82.8%
7GPT-5.2OpenAIClosed82.1%
8Qwen3.6 PlusAlibabaClosed81.5%
9Gemini 3 ProGoogleClosed81.4%
10MiMo-V2.5XiaomiClosed81%
11Qwen3.5 397BAlibabaOpen80.8%
12Kimi K2.6Moonshot AIOpen80.4%
13Gemini 3.1 ProGoogleClosed80.2%
14Qwen3.6-27BAlibabaOpen78.4%
15Qwen3.6-35B-A3BAlibabaOpen78%
16Claude Sonnet 4.6AnthropicClosed77.4%
17Qwen3.5-122B-A10BAlibabaOpen77.2%
18Nemotron 3 Nano Omni 30B A3BNVIDIAOpen76.3%
19Gemini 3.1 Flash-LiteGoogleClosed73.2%
20Claude Opus 4.5AnthropicClosed68.5%
21Grok 4.20xAIClosed60.9%
22Command A+CohereOpen52.7%