context.vn

bullshit Bench V2

63 models evaluated

#ModelProviderTypeScore
4Claude Opus 4.6 (high)anthropic/claude-opus-4.6@reasoning=highAnthropicClosed
6Claude Opus 4.7 (none)anthropic/claude-opus-4.7@reasoning=noneAnthropicClosed
7Claude Sonnet 4.5 (high)anthropic/claude-sonnet-4.5@reasoning=highAnthropicClosed
9Qwen3.5 397B (Reasoning) (high)qwen/qwen3.5-397b-a17b@reasoning=highAlibabaOpen
10Claude Haiku 4.5 (high)anthropic/claude-haiku-4.5@reasoning=highAnthropicClosed
13Qwen3.6 Plus (none)qwen/qwen3.6-plus@reasoning=noneAlibabaClosed
15Qwen3.5 397B (none)qwen/qwen3.5-397b-a17b@reasoning=noneAlibabaOpen
17Kimi K2.6 (none)moonshotai/kimi-k2.6@reasoning=noneMoonshot AIOpen
20MiMo-V2.5-Pro (xhigh)xiaomi/mimo-v2.5-pro@reasoning=xhighXiaomiClosed
25Kimi K2.5 (none)moonshotai/kimi-k2.5@reasoning=noneMoonshot AIOpen
26Grok 4.3 (minimal)x-ai/grok-4.3@reasoning=minimalxAIClosed
30GPT-5.4 (none)openai/gpt-5.4@reasoning=noneOpenAIClosed
31Gemini 3 Pro (low)google/gemini-3-pro-preview@reasoning=lowGoogleClosed
32GPT-5.5 (xhigh)openai/gpt-5.5@reasoning=xhighOpenAIClosed
38GPT-5.2-Codex (low)openai/gpt-5.2-codex@reasoning=lowOpenAIClosed
39Claude 3.5 Sonnetanthropic/claude-3.5-sonnet@reasoning=defaultAnthropicClosed
40GPT-5.1openai/gpt-5.1-chat@reasoning=defaultOpenAIClosed
41Claude 4.1 Opus (none)anthropic/claude-opus-4.1@reasoning=noneAnthropicClosed
47GPT-5.3 Instantopenai/gpt-5.3-chat@reasoning=defaultOpenAIClosed
50GPT-5.2 (none)openai/gpt-5.2@reasoning=noneOpenAIClosed
52Gemini 3.1 Pro (low)google/gemini-3.1-pro-preview@reasoning=lowGoogleClosed
55GPT-5.5 Pro (xhigh)openai/gpt-5.5-pro@reasoning=xhighOpenAIClosed
56Gemini 3 Pro Deep Think (high)google/gemini-3-pro-preview@reasoning=highGoogleClosed
57MiMo-V2.5 (xhigh)xiaomi/mimo-v2.5@reasoning=xhighXiaomiClosed
62GPT-5.4 mini (high)openai/gpt-5.4-mini@reasoning=highOpenAIClosed
64GPT-5.1-Codex-Maxopenai/gpt-5.1-codex@reasoning=defaultOpenAIClosed
66Kimi K2.5 (Reasoning) (high)moonshotai/kimi-k2.5@reasoning=highMoonshot AIClosed
68GLM-5-Turbo (high)z-ai/glm-5-turbo@reasoning=highZ.AIClosed
70Claude 4 Sonnet (high)anthropic/claude-sonnet-4@reasoning=highAnthropicClosed
73Llama 4 Maverickmeta-llama/llama-4-maverick@reasoning=defaultMetaOpen
74GLM-5 (Reasoning) (high)z-ai/glm-5@reasoning=highZ.AIOpen
76GPT-5.2 Instantopenai/gpt-5.2-chat@reasoning=defaultOpenAIClosed
77o3openai/o3@reasoning=defaultOpenAIClosed
80Gemma 4 31B (high)google/gemma-4-31b-it@reasoning=highGoogleOpen
81GPT-5.3 Codex (low)openai/gpt-5.3-codex@reasoning=lowOpenAIClosed
84GLM-5.1 (xhigh)z-ai/glm-5.1@reasoning=xhighZ.AIOpen
85Step 3.5 Flash (xhigh)stepfun/step-3.5-flash@reasoning=xhighStepFunOpen
87Gemma 4 26B A4B (xhigh)google/gemma-4-26b-a4b-it@reasoning=xhighGoogleOpen
90Gemini 2.5 Progoogle/gemini-2.5-pro@reasoning=defaultGoogleClosed
91GLM-5 (none)z-ai/glm-5@reasoning=noneZ.AIOpen
93Gemini 3.5 Flash (xhigh)google/gemini-3.5-flash@reasoning=xhighGoogleClosed
95Grok 4.1 Fast (high)x-ai/grok-4.1-fast@reasoning=highxAIClosed
96Llama 4 Scoutmeta-llama/llama-4-scout@reasoning=defaultMetaOpen
97Gemini 2.5 Flashgoogle/gemini-2.5-flash@reasoning=defaultGoogleClosed
100DeepSeek V4 Flash (none)deepseek/deepseek-v4-flash@reasoning=noneDeepSeekOpen
102Trinity-Large-Thinking (minimal)arcee-ai/trinity-large-thinking@reasoning=minimalArcee AIOpen
103MiMo-V2-Flash (none)xiaomi/mimo-v2-flash@reasoning=noneXiaomiOpen
104Hy3 Preview (none)tencent/hy3-preview:free@reasoning=noneTencentOpen
106DeepSeek V4 Pro (xhigh)deepseek/deepseek-v4-pro@reasoning=xhighDeepSeekOpen
108GPT-5.4 nano (high)openai/gpt-5.4-nano@reasoning=highOpenAIClosed
110GPT-4.1openai/gpt-4.1@reasoning=defaultOpenAIClosed
113DeepSeek V3.2 (Thinking) (high)deepseek/deepseek-v3.2@reasoning=highDeepSeekOpen
120Seed 1.6 (none)bytedance-seed/seed-1.6@reasoning=noneByteDanceClosed
121GPT-OSS 120B (low)openai/gpt-oss-120b@reasoning=lowOpenAIOpen
124Gemini 3 Flash (high)google/gemini-3-flash-preview@reasoning=highGoogleClosed
125DeepSeek V3.2 (none)deepseek/deepseek-v3.2@reasoning=noneDeepSeekOpen
126Claude 3 Haikuanthropic/claude-3-haiku@reasoning=defaultAnthropicClosed
129Kimi K2moonshotai/kimi-k2@reasoning=defaultMoonshot AIClosed
131MiniMax M2.5 (low)minimax/minimax-m2.5@reasoning=lowMiniMaxClosed
134GLM-4.5 (xhigh)z-ai/glm-4.5@reasoning=xhighZ.AIClosed
135MiniMax M2.7 (high)minimax/minimax-m2.7@reasoning=highMiniMaxOpen
136DeepSeek-R1 (xhigh)deepseek/deepseek-r1@reasoning=xhighDeepSeekOpen
137o4-mini (high) (low)openai/o4-mini@reasoning=lowOpenAIClosed