context.vn

hle No Tools

16 models evaluated

#ModelProviderTypeScore
1Claude Mythos PreviewAnthropicClosed56.8%
2Claude Opus 4.8AnthropicClosed49.8%
3Claude Opus 4.7 (Adaptive)AnthropicClosed46.9%
4Gemini 3.1 ProGoogleClosed45.4%
5GPT-5.5 ProOpenAIClosed43.1%
6Muse SparkMetaClosed42.8%
7GPT-5.4 ProOpenAIClosed42.7%
8GPT-5.5OpenAIClosed41.4%
9Claude Opus 4.6AnthropicClosed40%
10GPT-5.4OpenAIClosed39.8%
11MiMo-V2.5-ProXiaomiClosed34%
12Grok 4.20xAIClosed31.6%
13GPT-5.4 miniOpenAIClosed28.2%
14GPT-5.4 nanoOpenAIClosed24.3%
15Gemma 4 31BGoogleOpen19.5%
16Gemma 4 26B A4BGoogleOpen8.7%