context.vn

health Bench Hard

5 models evaluated

#ModelProviderTypeScore
1Muse SparkMetaClosed42.8%
2GPT-5.4OpenAIClosed40.1%
3Gemini 3.1 ProGoogleClosed20.6%
4Grok 4.20xAIClosed20.3%
5Claude Opus 4.6AnthropicClosed14.8%