SOTAVerified

Multiple-choice

Papers

Showing 381390 of 1107 papers

TitleStatusHype
Reversal Blessing: Thinking Backward May Outpace Thinking Forward in Multi-choice Questions0
DeepSeek-R1 Outperforms Gemini 2.0 Pro, OpenAI o1, and o3-mini in Bilingual Complex Ophthalmology Reasoning0
The Lazy Student's Dream: ChatGPT Passing an Engineering Course on Its Own0
Wrong Answers Can Also Be Useful: PlausibleQA -- A Large-Scale QA Dataset with Answer Plausibility ScoresCode0
LegalBench.PT: A Benchmark for Portuguese Law0
Moving Beyond Medical Exam Questions: A Clinician-Annotated Dataset of Real-World Tasks and Ambiguity in Mental HealthcareCode0
MHQA: A Diverse, Knowledge Intensive Mental Health Question Answering Challenge for Language Models0
Do LLMs Make Mistakes Like Students? Exploring Natural Alignment between Language Models and Human Error Patterns0
Unveiling Cultural Blind Spots: Analyzing the Limitations of mLLMs in Procedural Text Comprehension0
MCQA-Eval: Efficient Confidence Evaluation in NLG with Gold-Standard Correctness Labels0
Show:102550
← PrevPage 39 of 111Next →

No leaderboard results yet.