SOTAVerified

Multiple-choice

Papers

Showing 101110 of 1107 papers

TitleStatusHype
ArabicMMLU: Assessing Massive Multitask Language Understanding in ArabicCode1
Explaining NLP Models via Minimal Contrastive Editing (MiCE)Code1
FaceXBench: Evaluating Multimodal LLMs on Face UnderstandingCode1
FarsTail: A Persian Natural Language Inference DatasetCode1
IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerceCode1
Do Large Language Models Understand Conversational Implicature -- A case study with a chinese sitcomCode1
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 LanguagesCode1
Delving into the Reversal Curse: How Far Can Large Language Models Generalize?Code1
Fool Your (Vision and) Language Model With Embarrassingly Simple PermutationsCode1
Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across ModalitiesCode1
Show:102550
← PrevPage 11 of 111Next →

No leaderboard results yet.