Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 301–350 of 1107 papers

Title	Date	Tasks	Status	Score
Can We Guide a Multi-Hop Reasoning Language Model to Incrementally Learn at Each Single-Hop?	Oct 1, 2022	Language ModelingLanguage Modelling	CodeCode Available	5
Can multiple-choice questions really be useful in detecting the abilities of LLMs?	Mar 26, 2024	Multiple-choiceQuestion Answering	CodeCode Available	5
Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models	May 30, 2025	MathMultiple-choice	CodeCode Available	5
MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models	Apr 7, 2024	Benchmarkingknowledge editing	CodeCode Available	5
Can Model Uncertainty Function as a Proxy for Multiple-Choice Question Item Difficulty?	Jul 7, 2024	Multiple-choice	CodeCode Available	5
A quantitative study of NLP approaches to question difficulty estimation	May 17, 2023	MathMultiple-choice	CodeCode Available	5
Can Large Language Models Provide Security & Privacy Advice? Measuring the Ability of LLMs to Refute Misconceptions	Oct 3, 2023	MisconceptionsMultiple-choice	CodeCode Available	5
A Joint Sequence Fusion Model for Video Question Answering and Retrieval	Aug 7, 2018	DecoderMultiple-choice	CodeCode Available	5
MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering	Nov 1, 2021	multimodal interactionMultiple-choice	CodeCode Available	5
From Multiple-Choice to Extractive QA: A Case Study for English and Arabic	Apr 26, 2024	BelebeleExtractive Question-Answering	CodeCode Available	5
AILS-NTUA at SemEval-2024 Task 9: Cracking Brain Teasers: Transformer Models for Lateral Thinking Puzzles	Apr 1, 2024	Common Sense ReasoningMultiple-choice	CodeCode Available	5
Sentence Embeddings for Russian NLU	Oct 29, 2019	Multiple-choiceParaphrase Identification	CodeCode Available	5
BUCA: A Binary Classification Approach to Unsupervised Commonsense Question Answering	May 25, 2023	Binary ClassificationKnowledge Graphs	CodeCode Available	5
PROST: Physical Reasoning of Objects through Space and Time	Jun 7, 2021	Multiple-choice	CodeCode Available	5
Answer-level Calibration for Free-form Multiple Choice Question Answering	May 1, 2022	FormLanguage Modeling	CodeCode Available	5
MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models	Dec 31, 2024	Multiple-choiceQuestion Answering	CodeCode Available	5
MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback	Oct 17, 2024	Fact VerificationHallucination	CodeCode Available	5
BnMMLU: Measuring Massive Multitask Language Understanding in Bengali	May 25, 2025	General KnowledgeMMLU	CodeCode Available	5
Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think	Apr 12, 2024	Multiple-choice	CodeCode Available	5
Measuring Agreeableness Bias in Multimodal Models	Aug 17, 2024	Decision MakingMultiple-choice	CodeCode Available	5
LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models	Oct 13, 2024	HallucinationHallucination Evaluation	CodeCode Available	5
Biomedical Entity Linking as Multiple Choice Question Answering	Feb 23, 2024	Entity LinkingMultiple-choice	CodeCode Available	5
LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMs	Jun 7, 2024	Mathematical ReasoningMultiple-choice	CodeCode Available	5
MedArabiQ: Benchmarking Large Language Models on Arabic Medical Tasks	May 6, 2025	BenchmarkingMultiple-choice	CodeCode Available	5
Limited Ability of LLMs to Simulate Human Psychological Behaviours: a Psychometric Analysis	May 12, 2024	Multiple-choiceQuestion Answering	CodeCode Available	5
Leveraging large language models for nano synthesis mechanism explanation: solid foundations or mere conjectures?	Jul 12, 2024	Logical ReasoningMultiple-choice	CodeCode Available	5
LiveQA: A Question Answering Dataset over Sports Live	Oct 1, 2020	Multiple-choiceQuestion Answering	CodeCode Available	5
Eliciting Informative Text Evaluations with Large Language Models	May 23, 2024	Multiple-choicePrediction	CodeCode Available	5
Towards Efficient Methods in Medical Question Answering using Knowledge Graph Embeddings	Jan 15, 2024	Knowledge Graph EmbeddingsKnowledge Graphs	CodeCode Available	5
HSI: Head-Specific Intervention Can Induce Misaligned AI Coordination in Large Language Models	Feb 9, 2025	Answer GenerationLanguage Modeling	CodeCode Available	5
LLaVA-OneVision: Easy Visual Task Transfer	Aug 6, 2024	3D Question Answering (3D-QA)	CodeCode Available	5
LEAVS: An LLM-based Labeler for Abdominal CT Supervision	Mar 17, 2025	AnatomyLarge Language Model	CodeCode Available	5
Edu-Values: Towards Evaluating the Chinese Education Values of Large Language Models	Sep 19, 2024	EthicsMultiple-choice	CodeCode Available	5
A Novel Multi-Stage Prompting Approach for Language Agnostic MCQ Generation using GPT	Jan 13, 2024	Distractor GenerationMultiple-choice	CodeCode Available	5
Length Optimization in Conformal Prediction	Jun 27, 2024	Conformal PredictionLanguage Modeling	CodeCode Available	5
Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning Distractor	Dec 8, 2024	MisconceptionsMultiple-choice	CodeCode Available	5
Learning to Reuse Distractors to support Multiple Choice Question Generation in Education	Oct 25, 2022	Multiple-choiceQuestion Generation	CodeCode Available	5
Learning to Attend On Essential Terms: An Enhanced Retriever-Reader Model for Open-domain Question Answering	Aug 28, 2018	AI2 Reasoning ChallengeARC	CodeCode Available	5
Beyond English-Only Reading Comprehension: Experiments in Zero-Shot Multilingual Transfer for Bulgarian	Aug 5, 2019	Multiple-choicePhilosophy	CodeCode Available	5
EAIRA: Establishing a Methodology for Evaluating AI Models as Scientific Research Assistants	Feb 27, 2025	Multiple-choice	CodeCode Available	5
Leaving the barn door open for Clever Hans: Simple features predict LLM benchmark answers	Oct 15, 2024	Multiple-choice	CodeCode Available	5
DyePack: Provably Flagging Test Set Contamination in LLMs Using Backdoors	May 29, 2025	MMLUMultiple-choice	CodeCode Available	5
BERT-based distractor generation for Swedish reading comprehension questions using a small-scale dataset	Aug 9, 2021	Distractor GenerationMultiple-choice	CodeCode Available	5
SCoRE: Benchmarking Long-Chain Reasoning in Commonsense Scenarios	Mar 8, 2025	BenchmarkingDiagnostic	CodeCode Available	5
DREAM: A Challenge Dataset and Models for Dialogue-Based Reading Comprehension	Feb 1, 2019	Dialogue UnderstandingMultiple-choice	CodeCode Available	5
ElimiNet: A Model for Eliminating Options for Reading Comprehension with Multiple Choice Questions	Apr 4, 2019	Multiple-choiceReading Comprehension	CodeCode Available	5
BertaQA: How Much Do Language Models Know About Local Culture?	Jun 11, 2024	Multiple-choiceTransfer Learning	CodeCode Available	5
EMBRACE: Evaluation and Modifications for Boosting RACE	May 15, 2023	Machine Reading ComprehensionMultiple-choice	CodeCode Available	5
Language Models as Knowledge Bases for Visual Word Sense Disambiguation	Oct 3, 2023	Image CaptioningMultiple-choice	CodeCode Available	5
It's Not Easy Being Wrong: Large Language Models Struggle with Process of Elimination Reasoning	Nov 13, 2023	Multiple-choice	CodeCode Available	5

Show:10 25 50

← PrevPage 7 of 23Next →

No leaderboard results yet.