Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 651–700 of 1107 papers

Title	Date	Tasks	Status
Predicting the Difficulty of Multiple Choice Questions in a High-stakes Medical Exam	Aug 1, 2019	Multiple-choiceQuestion Answering	—Unverified
Predictions from language models for multiple-choice tasks are not robust under variation of scoring methods	Mar 1, 2024	Multiple-choice	—Unverified
Probabilistic Consensus through Ensemble Validation: A Framework for LLM Reliability	Nov 10, 2024	Multiple-choiceText Generation	—Unverified
Prompt Engineering and Calibration for Zero-Shot Commonsense Reasoning	Apr 14, 2023	Multiple-choicePrompt Engineering	—Unverified
Prompting Implicit Discourse Relation Annotation	Feb 7, 2024	ClassificationImplicit Discourse Relation Classification	—Unverified
Instruction Fine-Tuning: Does Prompt Loss Matter?	Jan 24, 2024	Multiple-choicetoken-classification	—Unverified
ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding	Nov 7, 2024	BenchmarkingMultiple-choice	—Unverified
ConceptPsy:A Benchmark Suite with Conceptual Comprehensiveness in Psychology	Nov 16, 2023	MMLUMultiple-choice	—Unverified
PUB: A Pragmatics Understanding Benchmark for Assessing LLMs' Pragmatics Capabilities	Jan 13, 2024	Instruction FollowingMultiple-choice	—Unverified
Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs	Sep 30, 2024	BenchmarkingMultiple-choice	—Unverified
Q-Bench-Video: Benchmark the Video Quality Understanding of LMMs	Jan 1, 2025	Multiple-choiceVideo Generation	—Unverified
QOG:Question and Options Generation based on Language Model	Jun 18, 2024	Information RetrievalLanguage Modeling	—Unverified
QRMeM: Unleash the Length Limitation through Question then Reflection Memory Mechanism	Jun 19, 2024	Multiple-choiceQuestion Answering	—Unverified
VisNumBench: Evaluating Number Sense of Multimodal Large Language Models	Mar 19, 2025	Multiple-choice	—Unverified
Query Rewriting for Retrieval-Augmented Large Language Models	May 23, 2023	Language ModelingLanguage Modelling	—Unverified
Question Difficulty Ranking for Multiple-Choice Reading Comprehension	Apr 16, 2024	Multiple-choiceReading Comprehension	—Unverified
Question-type Identification for Academic Questions in Online Learning Platform	Nov 24, 2022	Binary ClassificationMultiple-choice	—Unverified
Visual7W: Grounded Question Answering in Images	Nov 11, 2015	Multiple-choiceMultiple Choice Question Answering (MCQA)	—Unverified
Ranking Facts for Explaining Answers to Elementary Science Questions	Oct 18, 2021	Interpretable Machine LearningLearning-To-Rank	—Unverified
Ranking Large Language Models without Ground Truth	Feb 21, 2024	Multiple-choiceTriplet	—Unverified
Read, Retrospect, Select: An MRC Framework to Short Text Entity Linking	Jan 7, 2021	Entity LinkingMachine Reading Comprehension	—Unverified
RECAP-KG: Mining Knowledge Graphs from Raw GP Notes for Remote COVID-19 Assessment in Primary Care	Jun 17, 2023	Decision Makinggraph construction	—Unverified
Receptivity of an AI Cognitive Assistant by the Radiology Community: A Report on Data Collected at RSNA	Sep 13, 2020	Multiple-choiceQuestion Answering	—Unverified
Recurrent and Contextual Models for Visual Question Answering	Mar 23, 2017	DiversityMultiple-choice	—Unverified
Visual Madlibs: Fill in the Blank Description Generation and Question Answering	Dec 1, 2015	Multiple-choiceQuestion Answering	—Unverified
Rethinking AI Cultural Alignment	Jan 13, 2025	Multiple-choice	—Unverified
Rethinking Generative Large Language Model Evaluation for Semantic Comprehension	Mar 12, 2024	Language Model EvaluationLanguage Modeling	—Unverified
Reusing Swedish FrameNet for training semantic roles	May 1, 2014	Multiple-choice	—Unverified
Reversal Blessing: Thinking Backward May Outpace Thinking Forward in Multi-choice Questions	Feb 25, 2025	Inductive BiasLogical Reasoning	—Unverified
RiddleSense: Reasoning about Riddle Questions Featuring Linguistic Creativity and Commonsense Knowledge	Jan 2, 2021	counterfactualCounterfactual Reasoning	—Unverified
RISCORE: Enhancing In-Context Riddle Solving in Language Models through Context-Reconstructed Example Augmentation	Sep 24, 2024	Multiple-choiceSentence	—Unverified
R-LLaVA: Improving Med-VQA Understanding through Visual Region of Interest	Oct 27, 2024	Medical Visual Question AnsweringMultiple-choice	—Unverified
Robo2VLM: Visual Question Answering from Large-Scale In-the-Wild Robot Manipulation Datasets	May 21, 2025	Dataset GenerationDescriptive	—Unverified
Robust portfolio optimization model for electronic coupon allocation	May 21, 2024	Multiple-choicePortfolio Optimization	—Unverified
Visual Madlibs: Fill in the blank Image Generation and Question Answering	May 31, 2015	Image GenerationMultiple-choice	—Unverified
SafePath: Conformal Prediction for Safe LLM-Based Autonomous Navigation	May 14, 2025	Autonomous DrivingAutonomous Navigation	—Unverified
Adversarial Training for Machine Reading Comprehension with Virtual Embeddings	Jun 8, 2021	Machine Reading ComprehensionMultiple-choice	—Unverified
SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for reference-free open-ended text	Nov 25, 2024	Language ModelingLanguage Modelling	—Unverified
Visual Question Answering as Reading Comprehension	Nov 29, 2018	Common Sense ReasoningGeneral Knowledge	—Unverified
Adversarial Databases Improve Success in Retrieval-based Large Language Models	Jul 19, 2024	Multiple-choiceRAG	—Unverified
SaL-Lightning Dataset: Search and Eye Gaze Behavior, Resource Interactions and Knowledge Gain during Web Search	Jan 7, 2022	Information RetrievalMultiple-choice	—Unverified
Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models	Oct 10, 2024	Conformal PredictionLanguage Modeling	—Unverified
SARI: Structured Audio Reasoning via Curriculum-Guided Reinforcement Learning	Apr 22, 2025	Multiple-choicereinforcement-learning	—Unverified
SaudiCulture: A Benchmark for Evaluating Large Language Models Cultural Competence within Saudi Arabia	Mar 21, 2025	Multiple-choice	—Unverified
SB-Bench: Stereotype Bias Benchmark for Large Multimodal Models	Feb 12, 2025	FairnessMultiple-choice	—Unverified
SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark	Feb 6, 2024	Multiple-choiceQuestion Answering	—Unverified
Scene Restoring for Narrative Machine Reading Comprehension	Nov 1, 2020	Cloze TestMachine Reading Comprehension	—Unverified
Scheduling Algorithms for Federated Learning with Minimal Energy Consumption	Sep 13, 2022	Federated LearningMultiple-choice	—Unverified
VITAL: A New Dataset for Benchmarking Pluralistic Alignment in Healthcare	Feb 19, 2025	BenchmarkingDiversity	—Unverified
GeoSQA: A Benchmark for Scenario-based Question Answering in the Geography Domain at High School Level	Aug 20, 2019	General KnowledgeMultiple-choice	—Unverified

Show:10 25 50

← PrevPage 14 of 23Next →

No leaderboard results yet.