Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 226–250 of 1107 papers

Title	Date	Tasks	Status	Hype
Enhancing Knowledge Tracing with Concept Map and Response Disentanglement	Aug 23, 2024	DisentanglementKnowledge Tracing	CodeCode Available	1
Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset and Comprehensive Framework	Jul 24, 2023	Contrastive LearningMultimodal Reasoning	CodeCode Available	1
Taming Overconfidence in LLMs: Reward Calibration in RLHF	Oct 13, 2024	Multiple-choice	CodeCode Available	1
Clues Before Answers: Generation-Enhanced Multiple-Choice QA	Apr 30, 2022	DecoderMultiple-choice	CodeCode Available	1
Estimating Contamination via Perplexity: Quantifying Memorisation in Language Model Evaluation	Sep 19, 2023	Language Model EvaluationLanguage Modeling	CodeCode Available	1
Evaluating the Knowledge Dependency of Questions	Nov 21, 2022	Multiple-choice	CodeCode Available	1
HyKGE: A Hypothesis Knowledge Graph Enhanced Framework for Accurate and Reliable Medical LLMs Responses	Dec 26, 2023	DiversityKnowledge Graphs	CodeCode Available	1
TIMEDIAL: Temporal Commonsense Reasoning in Dialog	Jun 8, 2021	Multiple-choiceTimedial	CodeCode Available	1
CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models	Sep 5, 2023	Code GenerationMultiple-choice	CodeCode Available	1
EduQG: A Multi-format Multiple Choice Dataset for the Educational Domain	Oct 12, 2022	Distractor GenerationMultiple-choice	CodeCode Available	1
ToMATO: Verbalizing the Mental States of Role-Playing LLMs for Benchmarking Theory of Mind	Jan 15, 2025	BenchmarkingMultiple-choice	CodeCode Available	1
E-EVAL: A Comprehensive Chinese K-12 Education Evaluation Benchmark for Large Language Models	Jan 29, 2024	EthicsMultiple-choice	CodeCode Available	1
TourSynbio: A Multi-Modal Large Model and Agent Framework to Bridge Text and Protein Sequences for Protein Engineering	Aug 27, 2024	Multiple-choiceProtein Folding	CodeCode Available	1
CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-training	Jun 15, 2024	Domain AdaptationLanguage Modeling	CodeCode Available	1
A BERT-based Distractor Generation Scheme with Multi-tasking and Negative Answer Training Strategies	Oct 12, 2020	Distractor GenerationMultiple-choice	CodeCode Available	1
TSQA: Tabular Scenario Based Question Answering	Jan 14, 2021	Machine Reading ComprehensionMultiple-choice	CodeCode Available	1
TUMTraffic-VideoQA: A Benchmark for Unified Spatio-Temporal Video Understanding in Traffic Scenes	Feb 4, 2025	Autonomous DrivingMultiple-choice	CodeCode Available	1
Counterfactual Variable Control for Robust and Interpretable Question Answering	Oct 12, 2020	Causal Inferencecounterfactual	CodeCode Available	1
Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models	Jul 15, 2024	Backdoor AttackMultiple-choice	CodeCode Available	1
Complex Reasoning over Logical Queries on Commonsense Knowledge Graphs	Mar 12, 2024	Knowledge GraphsMultiple-choice	CodeCode Available	1
Assessing the Chemical Intelligence of Large Language Models	May 12, 2025	Multiple-choice	CodeCode Available	1
Unsupervised Commonsense Question Answering with Self-Talk	Apr 11, 2020	Language ModelingLanguage Modelling	CodeCode Available	1
Conformal Prediction with Large Language Models for Multi-Choice Question Answering	May 28, 2023	Conformal PredictionMultiple-choice	CodeCode Available	1
Do Large Language Models Understand Conversational Implicature -- A case study with a chinese sitcom	Apr 30, 2024	ImplicaturesMultiple-choice	CodeCode Available	1
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding	Aug 17, 2023	DiagnosticEgoSchema	CodeCode Available	1

Show:10 25 50

← PrevPage 10 of 45Next →

No leaderboard results yet.