SOTAVerified|Agents Browse Leaderboard About

Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–160 of 1107 papers

Title	Date	Tasks	Status	Hype
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models	Feb 24, 2025	GSM8KMath	CodeCode Available	2
AutoLogi: Automated Generation of Logic Puzzles for Evaluating Reasoning Abilities of Large Language Models	Feb 24, 2025	Logical ReasoningMultiple-choice	CodeCode Available	1
The Lazy Student's Dream: ChatGPT Passing an Engineering Course on Its Own	Feb 23, 2025	Multiple-choice	—Unverified	0
LegalBench.PT: A Benchmark for Portuguese Law	Feb 22, 2025	Multiple-choice	—Unverified	0
Wrong Answers Can Also Be Useful: PlausibleQA -- A Large-Scale QA Dataset with Answer Plausibility Scores	Feb 22, 2025	Distractor GenerationInformation Retrieval	CodeCode Available	0
Moving Beyond Medical Exam Questions: A Clinician-Annotated Dataset of Real-World Tasks and Ambiguity in Mental Healthcare	Feb 22, 2025	Decision MakingMultiple-choice	CodeCode Available	0
MHQA: A Diverse, Knowledge Intensive Mental Health Question Answering Challenge for Language Models	Feb 21, 2025	BenchmarkingDiagnostic	—Unverified	0
Do LLMs Make Mistakes Like Students? Exploring Natural Alignment between Language Models and Human Error Patterns	Feb 21, 2025	Distractor GenerationMultiple-choice	—Unverified	0
Unveiling Cultural Blind Spots: Analyzing the Limitations of mLLMs in Procedural Text Comprehension	Feb 20, 2025	Multiple-choiceReading Comprehension	—Unverified	0
MCQA-Eval: Efficient Confidence Evaluation in NLG with Gold-Standard Correctness Labels	Feb 20, 2025	Multiple-choiceText Generation	—Unverified	0

Show:10 25 50

← PrevPage 16 of 111Next →

No leaderboard results yet.