Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 701–750 of 1107 papers

Title	Date	Tasks	Status
Is This Collection Worth My LLM's Time? Automatically Measuring Information Potential in Text Corpora	Feb 19, 2025	ArticlesMultiple-choice	—Unverified
An Algorithm for Generating Gap-Fill Multiple Choice Questions of an Expert System	Sep 17, 2021	Multiple-choicesoftware testing	—Unverified
It is Too Many Options: Pitfalls of Multiple-Choice Questions in Generative AI and Medical Education	Mar 13, 2025	Multiple-choice	—Unverified
Winning Amazon KDD Cup'24	Aug 5, 2024	Data AugmentationMultiple-choice	—Unverified
KMMLU: Measuring Massive Multitask Language Understanding in Korean	Feb 18, 2024	kmmluLanguage Model Evaluation	—Unverified
Knowledge-Driven Distractor Generation for Cloze-style Multiple Choice Questions	Apr 21, 2020	Distractor GenerationLearning-To-Rank	—Unverified
Knowledge Questions from Knowledge Graphs	Oct 31, 2016	Knowledge GraphsMultiple-choice	—Unverified
Knowledge Retrieval Based on Generative AI	Jan 8, 2025	Large Language ModelMultiple-choice	—Unverified
KoBALT: Korean Benchmark For Advanced Linguistic Tasks	May 22, 2025	Multiple-choice	—Unverified
KorMedMCQA: Multi-Choice Question Answering Benchmark for Korean Healthcare Professional Licensing Examinations	Mar 3, 2024	MedQAMMLU	—Unverified
KorNAT: LLM Alignment Benchmark for Korean Social Values and Common Knowledge	Feb 21, 2024	4kMultiple-choice	—Unverified
KRISTEVA: Close Reading as a Novel Task for Benchmarking Interpretive Reasoning	May 14, 2025	BenchmarkingMMLU	—Unverified
LAB-Bench: Measuring Capabilities of Language Models for Biology Research	Jul 14, 2024	Language ModellingMultiple-choice	—Unverified
LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs	Oct 18, 2024	BenchmarkingFairness	—Unverified
Language Enhanced Model for Eye (LEME): An Open-Source Ophthalmology-Specific Large Language Model	Oct 1, 2024	AllLanguage Modeling	—Unverified
Language models are susceptible to incorrect patient self-diagnosis in medical applications	Sep 17, 2023	DiagnosticMultiple-choice	—Unverified
Uncovering Cultural Representation Disparities in Vision-Language Models	May 20, 2025	Multiple-choice	—Unverified
Language Models (Mostly) Know What They Know	Jul 11, 2022	Multiple-choice	—Unverified
Uncovering Temporal Context for Video Question and Answering	Nov 15, 2015	DecoderMultiple-choice	—Unverified
LAR-ECHR: A New Legal Argument Reasoning Task and Dataset for Cases of the European Court of Human Rights	Oct 17, 2024	Legal ReasoningMultiple-choice	—Unverified
Large Language Models Are Self-Taught Reasoners: Enhancing LLM Applications via Tailored Problem-Solving Demonstrations	Aug 22, 2024	Multiple-choice	—Unverified
Large Language Models Could Be Rote Learners	Apr 11, 2025	MemorizationMMLU	—Unverified
Understanding Dataset Design Choices for Multi-hop Reasoning	Apr 27, 2019	Multi-hop Question AnsweringMultiple-choice	—Unverified
Large Language Models (GPT) Struggle to Answer Multiple-Choice Questions about Code	Mar 9, 2023	Multiple-choice	—Unverified
Large Language Models Often Know When They Are Being Evaluated	May 28, 2025	MMLUMultiple-choice	—Unverified
Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions	Aug 22, 2023	Multiple-choiceSensitivity	—Unverified
Large Language Models Still Exhibit Bias in Long Text	Oct 23, 2024	FairnessMultiple-choice	—Unverified
A Comparative Study of Open-Source Large Language Models, GPT-4 and Claude 2: Multiple-Choice Test Taking in Nephrology	Aug 9, 2023	Multiple-choice	—Unverified
Understanding Prior Bias and Choice Paralysis in Transformer-based Language Representation Models through Four Experimental Probes	Oct 3, 2022	Decision MakingMultiple-choice	—Unverified
Learning a Word-Level Language Model with Sentence-Level Noise Contrastive Estimation for Contextual Sentence Probability Estimation	Mar 14, 2021	Language ModelingLanguage Modelling	—Unverified
Learning Language-Visual Embedding for Movie Understanding with Natural-Language	Sep 26, 2016	Multiple-choiceRetrieval	—Unverified
Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering	Apr 16, 2016	General ClassificationHuman-Object Interaction Detection	—Unverified
Learning to Specialize with Knowledge Distillation for Visual Question Answering	Dec 1, 2018	General ClassificationGeneral Knowledge	—Unverified
An AI-based Solution for Enhancing Delivery of Digital Learning for Future Teachers	Nov 9, 2021	Multiple-choiceQuestion Generation	—Unverified
LegalBench.PT: A Benchmark for Portuguese Law	Feb 22, 2025	Multiple-choice	—Unverified
Teaching Pretrained Models with Commonsense Reasoning: A Preliminary KB-Based Approach	Sep 20, 2019	Few-Shot LearningLogical Reasoning	—Unverified
WIQA: A dataset for ``What if...'' reasoning over procedural text	Nov 1, 2019	Multiple-choice	—Unverified
LEXam: Benchmarking Legal Reasoning on 340 Law Exams	May 19, 2025	BenchmarkingLegal Reasoning	—Unverified
LHMKE: A Large-scale Holistic Multi-subject Knowledge Evaluation Benchmark for Chinese Large Language Models	Mar 19, 2024	Multiple-choice	—Unverified
WirelessMathBench: A Mathematical Modeling Benchmark for LLMs in Wireless Communications	May 20, 2025	Mathematical ReasoningMultiple-choice	—Unverified
Linguistic Legal Concept Extraction in Portuguese	Oct 22, 2018	EthicsMultiple-choice	—Unverified
Listening to the Wise Few: Select-and-Copy Attention Heads for Multiple-Choice QA	Oct 3, 2024	Multiple-choiceQuestion Answering	—Unverified
LLaMa-SciQ: An Educational Chatbot for Answering Science MCQ	Sep 25, 2024	ChatbotGSM8K	—Unverified
LLM-as-a-Judge & Reward Model: What They Can and Cannot Do	Sep 17, 2024	Language ModelingLanguage Modelling	—Unverified
LLM-based Text Simplification and its Effect on User Comprehension and Cognitive Load	May 4, 2025	ArticlesMultiple-choice	—Unverified
LLM Distillation for Efficient Few-Shot Multiple Choice Question Answering	Dec 13, 2024	Few-Shot LearningKnowledge Distillation	—Unverified
Unlearning vs. Obfuscation: Are We Truly Removing Knowledge?	May 5, 2025	Multiple-choice	—Unverified
LLM Evaluation Based on Aerospace Manufacturing Expertise: Automated Generation and Multi-Model Question Answering	Jan 25, 2025	Information RetrievalMultiple-choice	—Unverified
Unleashing the Potential of Large Language Model: Zero-shot VQA for Flood Disaster Scenario	Dec 4, 2023	Language ModelingLanguage Modelling	—Unverified
LLMs to Support a Domain Specific Knowledge Assistant	Feb 6, 2025	ChatbotMultiple-choice	—Unverified

Show:10 25 50

← PrevPage 15 of 23Next →

No leaderboard results yet.