SOTAVerified

Multiple-choice

Papers

Showing 101125 of 1107 papers

TitleStatusHype
VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric InformationCode1
CHOICE: Benchmarking the Remote Sensing Capabilities of Large Vision-Language ModelsCode1
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 LanguagesCode1
VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?Code1
MEG: Medical Knowledge-Augmented Large Language Models for Question AnsweringCode1
MILU: A Multi-task Indic Language Understanding BenchmarkCode1
Delving into the Reversal Curse: How Far Can Large Language Models Generalize?Code1
TimeSeriesExam: A time series understanding examCode1
WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluationCode1
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language ModelsCode1
Taming Overconfidence in LLMs: Reward Calibration in RLHFCode1
SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language ModelsCode1
MedQA-CS: Benchmarking Large Language Models Clinical Skills Using an AI-SCE FrameworkCode1
A Hitchhikers Guide to Fine-Grained Face Forgery Detection Using Common Sense ReasoningCode1
Boosting Healthcare LLMs Through Retrieved ContextCode1
Annealed Winner-Takes-All for Motion ForecastingCode1
Training on the Benchmark Is Not All You NeedCode1
TourSynbio: A Multi-Modal Large Model and Agent Framework to Bridge Text and Protein Sequences for Protein EngineeringCode1
Enhancing Knowledge Tracing with Concept Map and Response DisentanglementCode1
LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMsCode1
Annealed Multiple Choice Learning: Overcoming limitations of Winner-takes-all with annealingCode1
Evaluating language models as risk scoresCode1
TurkishMMLU: Measuring Massive Multitask Language Understanding in TurkishCode1
Fine-tuning Multimodal Large Language Models for Product BundlingCode1
Uncertainty is Fragile: Manipulating Uncertainty in Large Language ModelsCode1
Show:102550
← PrevPage 5 of 45Next →

No leaderboard results yet.