SOTAVerified

Multiple-choice

Papers

Showing 421430 of 1107 papers

TitleStatusHype
Changing Answer Order Can Decrease MMLU Accuracy0
Length Optimization in Conformal PredictionCode0
DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice QuestionsCode0
VarBench: Robust Language Model Benchmarking Through Dynamic Variable PerturbationCode0
Evaluating Visual and Cultural Interpretation: The K-Viscuit Benchmark with Human-VLM Collaboration0
HCQA @ Ego4D EgoSchema Challenge 2024Code1
African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object ClassificationCode1
SynDARin: Synthesising Datasets for Automated Reasoning in Low-Resource Languages0
QRMeM: Unleash the Length Limitation through Question then Reflection Memory Mechanism0
ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real WorldCode2
Show:102550
← PrevPage 43 of 111Next →

No leaderboard results yet.