context.vn

tau3 Bench

9 models evaluated

#ModelProviderTypeScore
1Mistral Medium 3.5 128BMistralOpen91.4%
2MiMo-V2.5-ProXiaomiClosed72.9%
3Qwen3.6 PlusAlibabaClosed70.7%
4GLM-5.1Z.AIOpen70.6%
5Claude Opus 4.5AnthropicClosed70.2%
6Qwen3.5 397BAlibabaOpen68.4%
7Qwen3.6-35B-A3BAlibabaOpen67.2%
8Kimi K2.5Moonshot AIOpen65.7%
9GLM-5Z.AIOpen65.6%