context.vn

terminal Bench2

24 models evaluated

#ModelProviderTypeScore
1GPT-5.5OpenAIClosed82.0%
2Gemini 3.5 FlashGoogleClosed76.2%
3Claude Opus 4.8AnthropicClosed74.6%
4Qwen3.7 MaxAlibabaClosed69.7%
5Claude Opus 4.7 (Adaptive)AnthropicClosed69.4%
6Composer 2.5CursorClosed69.3%
7MiMo-V2.5-ProXiaomiClosed68.4%
8DeepSeek V4 Pro (Max)DeepSeekOpen67.9%
9Kimi K2.6Moonshot AIOpen66.7%
10MiniMax M3MiniMaxOpen66.0%
11MiMo-V2.5XiaomiClosed65.8%
12Qwen 3.6 Max (preview)AlibabaClosed65.4%
13DeepSeek V4 Pro (High)DeepSeekOpen63.3%
14Composer 2CursorClosed61.7%
15Step 3.7 FlashStepFunOpen59.5%
16Qwen3.6-27BAlibabaOpen59.3%
17DeepSeek V4 ProDeepSeekOpen59.1%
18DeepSeek V4 Flash (Max)DeepSeekOpen56.9%
19DeepSeek V4 Flash (High)DeepSeekOpen56.6%
20Hy3 PreviewTencentOpen54.4%
21Qwen3.6-35B-A3BAlibabaOpen51.5%
22DeepSeek V4 FlashDeepSeekOpen49.1%
23Laguna M.1PoolsideClosed45.8%
24Laguna XS.2PoolsideOpen35.7%