SOTAVerified

Multiple-choice

Papers

Showing 591600 of 1107 papers

TitleStatusHype
Can Model Uncertainty Function as a Proxy for Multiple-Choice Question Item Difficulty?Code0
Are Large Language Models Consistent over Value-laden Questions?Code0
Is Your Large Language Model Knowledgeable or a Choices-Only Cheater?Code0
CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language Models0
DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice QuestionsCode0
Changing Answer Order Can Decrease MMLU Accuracy0
Length Optimization in Conformal PredictionCode0
VarBench: Robust Language Model Benchmarking Through Dynamic Variable PerturbationCode0
Evaluating Visual and Cultural Interpretation: The K-Viscuit Benchmark with Human-VLM Collaboration0
SynDARin: Synthesising Datasets for Automated Reasoning in Low-Resource Languages0
Show:102550
← PrevPage 60 of 111Next →

No leaderboard results yet.