SOTAVerified

Multiple-choice

Papers

Showing 851900 of 1107 papers

TitleStatusHype
CinePile: A Long Video Question Answering Dataset and Benchmark0
Cleared for Takeoff? Compositional & Conditional Reasoning may be the Achilles Heel to (Flight-Booking) Language Agents0
ClinBench-HPB: A Clinical Benchmark for Evaluating LLMs in Hepato-Pancreato-Biliary Diseases0
An Experimental Study of Deep Neural Network Models for Vietnamese Multiple-Choice Reading Comprehension0
CLIP-UP: CLIP-Based Unanswerable Problem Detection for Visual Question Answering0
Clozer: Adaptable Data Augmentation for Cloze-style Reading Comprehension0
Clozer”:" Adaptable Data Augmentation for Cloze-style Reading Comprehension0
Think you have Solved Direct-Answer Question Answering? Try ARC-DA, the Direct-Answer AI2 Reasoning Challenge0
A New Era: Intelligent Tutoring Systems Will Transform Online Learning for Millions0
CoddLLM: Empowering Large Language Models for Data Analytics0
CodeReviewQA: The Code Review Comprehension Assessment for Large Language Models0
COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domain0
Cognitive Biases in Large Language Models: A Survey and Mitigation Experiments0
Collaboration among Multiple Large Language Models for Medical Question Answering0
Thrilled by Your Progress! Large Language Models (GPT-4) No Longer Struggle to Pass Assessments in Higher Education Programming Courses0
Combinatorial framework for planning in geological exploration0
Combining Multiple Cues for Visual Madlibs Question Answering0
Comparative Study of Learning Outcomes for Online Learning Platforms0
Thunder-NUBench: A Benchmark for LLMs' Sentence-Level Negation Understanding0
Confidence-Aware Learning Assistant0
You Can Do Better! If You Elaborate the Reason When Making Prediction0
Context-guided Triple Matching for Multiple Choice Question Answering0
Context-guided Triple Matching for Multiple Choice Question Answering0
Context Modeling with Evidence Filter for Multiple Choice Question Answering0
Contextual Response Interpretation for Automated Structured Interviews: A Case Study in Market Research0
Controlling Cloze-test Question Item Difficulty with PLM-based Surrogate Models for IRT Assessment0
Conversational Assistants and Gender Stereotypes: Public Perceptions and Desiderata for Voice Personas0
ACPBench: Reasoning about Action, Change, and Planning0
Convolutional Spatial Attention Model for Reading Comprehension with Multiple-Choice Questions0
Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning0
CP-Router: An Uncertainty-Aware Router Between LLM and LRM0
Cracking the Code: Multi-domain LLM Evaluation on Real-World Professional Exams in Indonesia0
CroaTPAS: A Survey-based Evaluation0
Template Filling for Controllable Commonsense Reasoning0
Crowd Labeling: a survey0
Crowdsourcing Multiple Choice Science Questions0
CS-NLP team at SemEval-2020 Task 4: Evaluation of State-of-the-art NLP Deep Learning Architectures on Commonsense Reasoning Task0
CSReader at SemEval-2018 Task 11: Multiple Choice Question Answering as Textual Entailment0
Tokenization Standards for Linguistic Integrity: Turkish as a Benchmark0
A Neural Question Answering Model Based on Semi-Structured Tables0
Zero-shot Event Causality Identification with Question Answering0
DARE: Diverse Visual Question Answering with Robustness Evaluation0
ACPBench Hard: Unrestrained Reasoning about Action, Change, and Planning0
Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond0
Decision-Making Behavior Evaluation Framework for LLMs under Uncertain Context0
Deep learning for sentence clustering in essay grading support0
DeepQR: Neural-based Quality Ratings for Learnersourced Multiple-Choice Questions0
DeepSeek-R1 Outperforms Gemini 2.0 Pro, OpenAI o1, and o3-mini in Bilingual Complex Ophthalmology Reasoning0
Designing Templates for Eliciting Commonsense Knowledge from Pretrained Sequence-to-Sequence Models0
DeSIQ: Towards an Unbiased, Challenging Benchmark for Social Intelligence Understanding0
Show:102550
← PrevPage 18 of 23Next →

No leaderboard results yet.