SOTAVerified

Multiple-choice

Papers

Showing 131140 of 1107 papers

TitleStatusHype
African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object ClassificationCode1
FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food CultureCode1
CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-trainingCode1
IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerceCode1
BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and LanguagesCode1
INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in InsuranceCode1
MuirBench: A Comprehensive Benchmark for Robust Multi-image UnderstandingCode1
A Fine-tuning Dataset and Benchmark for Large Language Models for Protein UnderstandingCode1
TopViewRS: Vision-Language Models as Top-View Spatial ReasonersCode1
Embedding Trajectory for Out-of-Distribution Detection in Mathematical ReasoningCode1
Show:102550
← PrevPage 14 of 111Next →

No leaderboard results yet.