Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 876–900 of 1107 papers

Title	Date	Tasks	Status
PhysUniBench: An Undergraduate-Level Physics Reasoning Benchmark for Multimodal Models	Jun 21, 2025	Mathematical ReasoningMultiple-choice	—Unverified
Vision-Language Models Do Not Understand Negation	Jan 16, 2025	Multiple-choiceNegation	—Unverified
Predicting Item Survival for Multiple Choice Questions in a High-Stakes Medical Exam	May 1, 2020	Information RetrievalMultiple-choice	—Unverified
When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards	Feb 1, 2024	Answer SelectionLanguage Modeling	CodeCode Available
Video Prediction via Selective Sampling	Dec 1, 2018	Multiple-choicePrediction	CodeCode Available
MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback	Oct 17, 2024	Fact VerificationHallucination	CodeCode Available
CRiskEval: A Chinese Multi-Level Risk Evaluation Benchmark Dataset for Large Language Models	Jun 7, 2024	Multiple-choicePhilosophy	CodeCode Available
Student Answer Forecasting: Transformer-Driven Answer Choice Prediction for Language Learning	May 30, 2024	MisconceptionsMultiple-choice	CodeCode Available
Automating Turkish Educational Quiz Generation Using Large Language Models	Jun 5, 2024	Multiple-choice	CodeCode Available
How Can We Diagnose and Treat Bias in Large Language Models for Clinical Decision-Making?	Oct 21, 2024	counterfactualDecision Making	CodeCode Available
Measuring Agreeableness Bias in Multimodal Models	Aug 17, 2024	Decision MakingMultiple-choice	CodeCode Available
CSEPrompts: A Benchmark of Introductory Computer Science Prompts	Apr 3, 2024	Multiple-choice	CodeCode Available
MedArabiQ: Benchmarking Large Language Models on Arabic Medical Tasks	May 6, 2025	BenchmarkingMultiple-choice	CodeCode Available
MedG-KRP: Medical Graph Knowledge Representation Probing	Dec 14, 2024	Multiple-choiceMultiple Choice Question Answering (MCQA)	CodeCode Available
How much do LLMs learn from negative examples?	Mar 18, 2025	Multiple-choiceQuestion Answering	CodeCode Available
CNN for Text-Based Multiple Choice Question Answering	Jul 1, 2018	Multiple-choiceQuestion Answering	CodeCode Available
DAHL: Domain-specific Automated Hallucination Evaluation of Long-Form Text through a Benchmark Dataset in Biomedicine	Nov 14, 2024	FormHallucination	CodeCode Available
Confident Multiple Choice Learning	Jun 12, 2017	General Classificationimage-classification	CodeCode Available
VisBias: Measuring Explicit and Implicit Social Biases in Vision Language Models	Mar 10, 2025	Image DescriptionMultiple-choice	CodeCode Available
A Simple Method for Commonsense Reasoning	Jun 7, 2018	Common Sense ReasoningCoreference Resolution	CodeCode Available
Chance-Constrained Multiple-Choice Knapsack Problem: Model, Algorithms, and Applications	Jun 26, 2023	Combinatorial OptimizationMultiple-choice	CodeCode Available
Biomedical Entity Linking as Multiple Choice Question Answering	Feb 23, 2024	Entity LinkingMultiple-choice	CodeCode Available
(WhyPHI) Fine-Tuning PHI-3 for Multiple-Choice Question Answering: Methodology, Results, and Challenges	Jan 3, 2025	Multiple-choiceQuestion Answering	CodeCode Available
DE-COP: Detecting Copyrighted Content in Language Models Training Data	Feb 15, 2024	Language ModelingLanguage Modelling	CodeCode Available
Patent Figure Classification using Large Vision-language Models	Jan 22, 2025	ClassificationFew-Shot Learning	CodeCode Available

Show:10 25 50

← PrevPage 36 of 45Next →

No leaderboard results yet.