SOTAVerified

Multiple-choice

Papers

Showing 876900 of 1107 papers

TitleStatusHype
Controlling Cloze-test Question Item Difficulty with PLM-based Surrogate Models for IRT Assessment0
Conversational Assistants and Gender Stereotypes: Public Perceptions and Desiderata for Voice Personas0
ACPBench: Reasoning about Action, Change, and Planning0
Convolutional Spatial Attention Model for Reading Comprehension with Multiple-Choice Questions0
Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning0
CP-Router: An Uncertainty-Aware Router Between LLM and LRM0
Cracking the Code: Multi-domain LLM Evaluation on Real-World Professional Exams in Indonesia0
CroaTPAS: A Survey-based Evaluation0
Template Filling for Controllable Commonsense Reasoning0
Crowd Labeling: a survey0
Crowdsourcing Multiple Choice Science Questions0
CS-NLP team at SemEval-2020 Task 4: Evaluation of State-of-the-art NLP Deep Learning Architectures on Commonsense Reasoning Task0
CSReader at SemEval-2018 Task 11: Multiple Choice Question Answering as Textual Entailment0
Tokenization Standards for Linguistic Integrity: Turkish as a Benchmark0
A Neural Question Answering Model Based on Semi-Structured Tables0
Zero-shot Event Causality Identification with Question Answering0
DARE: Diverse Visual Question Answering with Robustness Evaluation0
ACPBench Hard: Unrestrained Reasoning about Action, Change, and Planning0
Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond0
Decision-Making Behavior Evaluation Framework for LLMs under Uncertain Context0
Deep learning for sentence clustering in essay grading support0
DeepQR: Neural-based Quality Ratings for Learnersourced Multiple-Choice Questions0
DeepSeek-R1 Outperforms Gemini 2.0 Pro, OpenAI o1, and o3-mini in Bilingual Complex Ophthalmology Reasoning0
Designing Templates for Eliciting Commonsense Knowledge from Pretrained Sequence-to-Sequence Models0
DeSIQ: Towards an Unbiased, Challenging Benchmark for Social Intelligence Understanding0
Show:102550
← PrevPage 36 of 45Next →

No leaderboard results yet.