SOTAVerified

Multiple-choice

Papers

Showing 126150 of 1107 papers

TitleStatusHype
MindGames: Targeting Theory of Mind in Large Language Models with Dynamic Epistemic Modal LogicCode1
IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian LanguagesCode1
CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language ModelsCode1
A Hitchhikers Guide to Fine-Grained Face Forgery Detection Using Common Sense ReasoningCode1
CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-trainingCode1
CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and ReasoningCode1
Counterfactual Variable Control for Robust and Interpretable Question AnsweringCode1
InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video UnderstandingCode1
Clues Before Answers: Generation-Enhanced Multiple-Choice QACode1
Ranked Voting based Self-Consistency of Large Language ModelsCode1
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language ModelsCode1
An Open Source Data Contamination Report for Large Language ModelsCode1
Annealed Winner-Takes-All for Motion ForecastingCode1
ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question AnsweringCode1
Annealed Multiple Choice Learning: Overcoming limitations of Winner-takes-all with annealingCode1
IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language ModelsCode1
INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in InsuranceCode1
An MRC Framework for Semantic Role LabelingCode1
African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object ClassificationCode1
GPT as Knowledge Worker: A Zero-Shot Evaluation of (AI)CPA CapabilitiesCode1
An In-depth Look at Gemini's Language AbilitiesCode1
Generating Distractors for Reading Comprehension Questions from Real ExaminationsCode1
GPT Takes the Bar ExamCode1
Fool Your (Vision and) Language Model With Embarrassingly Simple PermutationsCode1
Can large language models reason about medical questions?Code1
Show:102550
← PrevPage 6 of 45Next →

No leaderboard results yet.