Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 251–275 of 1107 papers

Title	Date	Tasks	Status	Hype	Score
CHOICE: Benchmarking the Remote Sensing Capabilities of Large Vision-Language Models	Nov 27, 2024	BenchmarkingEarth Observation	CodeCode Available	1	5
MedQA-CS: Benchmarking Large Language Models Clinical Skills Using an AI-SCE Framework	Oct 2, 2024	BenchmarkingInstruction Following	CodeCode Available	1	5
Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language Models	Nov 10, 2023	GSM8KMemorization	CodeCode Available	1	5
Delving into the Reversal Curse: How Far Can Large Language Models Generalize?	Oct 24, 2024	Multiple-choice	CodeCode Available	1	5
Mind the Confidence Gap: Overconfidence, Calibration, and Distractor Effects in Large Language Models	Feb 16, 2025	Multiple-choice	CodeCode Available	1	5
NextLevelBERT: Masked Language Modeling with Higher-Level Representations for Long Documents	Feb 27, 2024	Document ClassificationLanguage Modeling	CodeCode Available	1	5
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models	Feb 26, 2024	Multiple-choice	CodeCode Available	1	5
A Study on Large Language Models' Limitations in Multiple-Choice Question Answering	Jan 15, 2024	Multiple-choiceQuestion Answering	CodeCode Available	0	5
Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think	Apr 12, 2024	Multiple-choice	CodeCode Available	0	5
MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models	Dec 31, 2024	Multiple-choiceQuestion Answering	CodeCode Available	0	5
Analogical Reasoning Inside Large Language Models: Concept Vectors and the Limits of Abstraction	Mar 5, 2025	In-Context LearningMultiple-choice	CodeCode Available	0	5
LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models	Oct 13, 2024	HallucinationHallucination Evaluation	CodeCode Available	0	5
Confident Multiple Choice Learning	Jun 12, 2017	General Classificationimage-classification	CodeCode Available	0	5
Assessing the Quality of Multiple-Choice Questions Using GPT-4 and Rule-Based Methods	Jul 16, 2023	Multiple-choice	CodeCode Available	0	5
LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMs	Jun 7, 2024	Mathematical ReasoningMultiple-choice	CodeCode Available	0	5
LLaVA-OneVision: Easy Visual Task Transfer	Aug 6, 2024	3D Question Answering (3D-QA)	CodeCode Available	0	5
LiveQA: A Question Answering Dataset over Sports Live	Oct 1, 2020	Multiple-choiceQuestion Answering	CodeCode Available	0	5
COLUMBUS: Evaluating COgnitive Lateral Understanding through Multiple-choice reBUSes	Sep 6, 2024	Multiple-choiceQuestion Answering	CodeCode Available	0	5
MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback	Oct 17, 2024	Fact VerificationHallucination	CodeCode Available	0	5
A Simple Method for Commonsense Reasoning	Jun 7, 2018	Common Sense ReasoningCoreference Resolution	CodeCode Available	0	5
Leveraging large language models for nano synthesis mechanism explanation: solid foundations or mere conjectures?	Jul 12, 2024	Logical ReasoningMultiple-choice	CodeCode Available	0	5
Towards Efficient Methods in Medical Question Answering using Knowledge Graph Embeddings	Jan 15, 2024	Knowledge Graph EmbeddingsKnowledge Graphs	CodeCode Available	0	5
A Benchmark for Long-Form Medical Question Answering	Nov 14, 2024	Answer GenerationForm	CodeCode Available	0	5
Length Optimization in Conformal Prediction	Jun 27, 2024	Conformal PredictionLanguage Modeling	CodeCode Available	0	5
CNN for Text-Based Multiple Choice Question Answering	Jul 1, 2018	Multiple-choiceQuestion Answering	CodeCode Available	0	5

Show:10 25 50

← PrevPage 11 of 45Next →

No leaderboard results yet.