context.vn

hellaswag

2 models evaluated

#ModelProviderTypeScore
1DeepSeek V4 Pro BaseDeepSeekOpen88.0%
2DeepSeek V4 Flash BaseDeepSeekOpen85.7%