SOTAVerified

Multiple-choice

Papers

Showing 251300 of 1107 papers

TitleStatusHype
GPT Takes the Bar ExamCode1
Explicit Planning Helps Language Models in Logical ReasoningCode1
Evaluating the Knowledge Dependency of QuestionsCode1
ExplaGraphs: An Explanation Graph Generation Task for Structured Commonsense ReasoningCode1
IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language ModelsCode1
LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMsCode1
Multiple-Choice Questions are Efficient and Robust LLM EvaluatorsCode1
Controlling Cloze-test Question Item Difficulty with PLM-based Surrogate Models for IRT Assessment0
Contextual Response Interpretation for Automated Structured Interviews: A Case Study in Market Research0
Analysing the Effect of Masking Length Distribution of MLM: An Evaluation Framework and Case Study on Chinese MRC Datasets0
Context Modeling with Evidence Filter for Multiple Choice Question Answering0
Context-guided Triple Matching for Multiple Choice Question Answering0
AstroMLab 1: Who Wins Astronomy Jeopardy!?0
Evaluating LLM-corrupted Crowdsourcing Data Without Ground Truth0
Context-guided Triple Matching for Multiple Choice Question Answering0
A statistical model for aggregating judgments by incorporating peer predictions0
Advanced Financial Reasoning at Scale: A Comprehensive Evaluation of Large Language Models on CFA Level III0
Addressing Blind Guessing: Calibration of Selection Bias in Multiple-Choice Question Answering by Video Language Models0
Evaluating LLM -- Generated Multimodal Diagnosis from Medical Images and Symptom Analysis0
Confidence-Aware Learning Assistant0
Comparative Study of Learning Outcomes for Online Learning Platforms0
Assessing Large Language Models in Mechanical Engineering Education: A Study on Mechanics-Focused Conceptual Understanding0
An Algorithm for Generating Gap-Fill Multiple Choice Questions of an Expert System0
Combining Multiple Cues for Visual Madlibs Question Answering0
Combinatorial framework for planning in geological exploration0
Assessing Distractors in Multiple-Choice Tests0
Assessing AI-Generated Questions' Alignment with Cognitive Frameworks in Educational Assessment0
An AI-based Solution for Enhancing Delivery of Digital Learning for Future Teachers0
Evaluating LLMs on Document-Based QA: Exact Answer Selection and Numerical Extraction using Cogtale dataset0
Collaboration among Multiple Large Language Models for Medical Question Answering0
Cognitive Biases in Large Language Models: A Survey and Mitigation Experiments0
An Add-On for Empowering Google Forms to be an Automatic Question Generator in Online Assessments0
COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domain0
CodeReviewQA: The Code Review Comprehension Assessment for Large Language Models0
A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs0
A Data-Driven Study of Commonsense Knowledge using the ConceptNet Knowledge Base0
CoddLLM: Empowering Large Language Models for Data Analytics0
A Semantic Parsing Algorithm to Solve Linear Ordering Problems0
A Semantic Feature-Wise Transformation Relation Network for Automatic Short Answer Grading0
From Human Days to Machine Seconds: Automatically Answering and Generating Machine Learning Final Exams0
Establishing Task Scaling Laws via Compute-Efficient Model Ladders0
Aryl: An Elastic Cluster Scheduler for Deep Learning0
Clozer”:" Adaptable Data Augmentation for Cloze-style Reading Comprehension0
Clozer: Adaptable Data Augmentation for Cloze-style Reading Comprehension0
Amobee at SemEval-2019 Tasks 5 and 6: Multiple Choice CNN Over Contextual Embedding0
Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation0
CLIP-UP: CLIP-Based Unanswerable Problem Detection for Visual Question Answering0
A Method for Building a Commonsense Inference Dataset based on Basic Events0
ClinBench-HPB: A Clinical Benchmark for Evaluating LLMs in Hepato-Pancreato-Biliary Diseases0
Enhancing Multiple-Choice Question Answering with Causal Knowledge0
Show:102550
← PrevPage 6 of 23Next →

No leaderboard results yet.