SOTAVerified

Multiple-choice

Papers

Showing 301310 of 1107 papers

TitleStatusHype
AutoMCQ -- Automatically Generate Code Comprehension Questions using GenAI0
KoBALT: Korean Benchmark For Advanced Linguistic Tasks0
Robo2VLM: Visual Question Answering from Large-Scale In-the-Wild Robot Manipulation Datasets0
Set-LLM: A Permutation-Invariant LLM0
Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Prefilling Attack0
Uncovering Cultural Representation Disparities in Vision-Language Models0
WirelessMathBench: A Mathematical Modeling Benchmark for LLMs in Wireless Communications0
MR. Judge: Multimodal Reasoner as a Judge0
LEXam: Benchmarking Legal Reasoning on 340 Law Exams0
Teach2Eval: An Indirect Evaluation Method for LLM by Judging How It TeachesCode0
Show:102550
← PrevPage 31 of 111Next →

No leaderboard results yet.