Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 176–200 of 1107 papers

Title	Date	Tasks	Status	Hype	Score
Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language Models	Nov 10, 2023	GSM8KMemorization	CodeCode Available	1	5
BiMediX: Bilingual Medical Mixture of Experts LLM	Feb 20, 2024	Mixture-of-ExpertsMultiple-choice	CodeCode Available	1	5
E-EVAL: A Comprehensive Chinese K-12 Education Evaluation Benchmark for Large Language Models	Jan 29, 2024	EthicsMultiple-choice	CodeCode Available	1	5
Latxa: An Open Language Model and Evaluation Suite for Basque	Mar 29, 2024	Language ModelingLanguage Modelling	CodeCode Available	1	5
Let Androids Dream of Electric Sheep: A Human-like Image Implication Understanding and Reasoning Framework	May 22, 2025	Multiple-choiceVisual Question Answering (VQA)	CodeCode Available	1	5
LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation	Jun 4, 2025	Multiple-choice	CodeCode Available	1	5
BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages	Jun 14, 2024	Multiple-choice	CodeCode Available	1	5
JMedLoRA:Medical Domain Adaptation on Japanese Large Language Models using Instruction-tuning	Oct 16, 2023	Domain AdaptationMedical Question Answering	CodeCode Available	1	5
A Few More Examples May Be Worth Billions of Parameters	Oct 8, 2021	Extractive Question-AnsweringMultiple-choice	CodeCode Available	1	5
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation	Jan 6, 2025	Language Model EvaluationLanguage Modeling	CodeCode Available	1	5
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models	Oct 14, 2024	Multiple-choice	CodeCode Available	1	5
Boosting Healthcare LLMs Through Retrieved Context	Sep 23, 2024	BenchmarkingMultiple-choice	CodeCode Available	1	5
BRAINTEASER: Lateral Thinking Puzzles for Large Language Models	Oct 8, 2023	Distractor GenerationLanguage Modelling	CodeCode Available	1	5
Clues Before Answers: Generation-Enhanced Multiple-Choice QA	Apr 30, 2022	DecoderMultiple-choice	CodeCode Available	1	5
Bridging Video-text Retrieval with Multiple Choice Questions	Jan 13, 2022	Action RecognitionLinear evaluation	CodeCode Available	1	5
Knowledge Graph-Augmented Abstractive Summarization with Semantic-Driven Cloze Reward	May 3, 2020	Abstractive Text SummarizationCloze Test	CodeCode Available	1	5
ExplaGraphs: An Explanation Graph Generation Task for Structured Commonsense Reasoning	Apr 15, 2021	Graph GenerationMultiple-choice	CodeCode Available	1	5
IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce	Jun 14, 2024	Multiple-choiceQuestion Answering	CodeCode Available	1	5
Multiple-Choice Questions are Efficient and Robust LLM Evaluators	May 20, 2024	GSM8KHumanEval	CodeCode Available	1	5
Explaining NLP Models via Minimal Contrastive Editing (MiCE)	Dec 27, 2020	counterfactualMultiple-choice	CodeCode Available	1	5
A BERT-based Distractor Generation Scheme with Multi-tasking and Negative Answer Training Strategies.	Nov 1, 2020	Distractor GenerationMultiple-choice	CodeCode Available	1	5
NewsBench: A Systematic Evaluation Framework for Assessing Editorial Capabilities of Large Language Models in Chinese Journalism	Feb 29, 2024	EthicsMultiple-choice	CodeCode Available	1	5
Explicit Planning Helps Language Models in Logical Reasoning	Mar 28, 2023	Logical ReasoningMultiple-choice	CodeCode Available	1	5
AutoLogi: Automated Generation of Logic Puzzles for Evaluating Reasoning Abilities of Large Language Models	Feb 24, 2025	Logical ReasoningMultiple-choice	CodeCode Available	1	5
IRLBench: A Multi-modal, Culturally Grounded, Parallel Irish-English Benchmark for Open-Ended LLM Reasoning Evaluation	May 16, 2025	Multiple-choice	CodeCode Available	1	5

Show:10 25 50

← PrevPage 8 of 45Next →

No leaderboard results yet.