SOTAVerified

Multiple-choice

Papers

Showing 151175 of 1107 papers

TitleStatusHype
CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and ReasoningCode1
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?Code1
Clues Before Answers: Generation-Enhanced Multiple-Choice QACode1
IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian LanguagesCode1
African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object ClassificationCode1
Benchmarking AI scientists in omics data-driven biological researchCode1
An MRC Framework for Semantic Role LabelingCode1
Benchmarking Large Language Models on Answering and Explaining Challenging Medical QuestionsCode1
InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video UnderstandingCode1
IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerceCode1
Annealed Multiple Choice Learning: Overcoming limitations of Winner-takes-all with annealingCode1
Conformal Prediction with Large Language Models for Multi-Choice Question AnsweringCode1
Annealed Winner-Takes-All for Motion ForecastingCode1
Counterfactual Variable Control for Robust and Interpretable Question AnsweringCode1
An Open Source Data Contamination Report for Large Language ModelsCode1
CHOICE: Benchmarking the Remote Sensing Capabilities of Large Vision-Language ModelsCode1
Ranked Voting based Self-Consistency of Large Language ModelsCode1
CUPCase: Clinically Uncommon Patient Cases and Diagnoses DatasetCode1
Large Language Models Encode Clinical KnowledgeCode1
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language ModelsCode1
IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language ModelsCode1
Is Bigger and Deeper Always Better? Probing LLaMA Across Scales and LayersCode1
Delving into the Reversal Curse: How Far Can Large Language Models Generalize?Code1
Let Androids Dream of Electric Sheep: A Human-like Image Implication Understanding and Reasoning FrameworkCode1
ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question AnsweringCode1
Show:102550
← PrevPage 7 of 45Next →

No leaderboard results yet.