Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–150 of 1107 papers

Title	Date	Tasks	Status	Hype
ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic	Feb 20, 2024	ArabicMMLULanguage Model Evaluation	CodeCode Available	1
IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce	Jun 14, 2024	Multiple-choiceQuestion Answering	CodeCode Available	1
Knowledge Graph-Augmented Abstractive Summarization with Semantic-Driven Cloze Reward	May 3, 2020	Abstractive Text SummarizationCloze Test	CodeCode Available	1
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting	May 7, 2023	Multiple-choice	CodeCode Available	1
IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages	Nov 8, 2020	Genre classificationMultiple-choice	CodeCode Available	1
Large Language Models Encode Clinical Knowledge	Dec 26, 2022	Clinical KnowledgeMedQA	CodeCode Available	1
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages	Nov 25, 2024	AllLong Question Answer	CodeCode Available	1
JMedLoRA:Medical Domain Adaptation on Japanese Large Language Models using Instruction-tuning	Oct 16, 2023	Domain AdaptationMedical Question Answering	CodeCode Available	1
Leveraging Large Language Models for Learning Complex Legal Concepts through Storytelling	Feb 26, 2024	Multiple-choice	CodeCode Available	1
Leveraging Large Language Models for Multiple Choice Question Answering	Oct 22, 2022	Answer SelectionMultiple-choice	CodeCode Available	1
LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMs	Aug 16, 2024	Instruction FollowingMultiple-choice	CodeCode Available	1
Generating Distractors for Reading Comprehension Questions from Real Examinations	Sep 8, 2018	DecoderDistractor Generation	CodeCode Available	1
Is Bigger and Deeper Always Better? Probing LLaMA Across Scales and Layers	Dec 7, 2023	MathMultiple-choice	CodeCode Available	1
GPT as Knowledge Worker: A Zero-Shot Evaluation of (AI)CPA Capabilities	Jan 11, 2023	Multiple-choice	CodeCode Available	1
From Machine Reading Comprehension to Dialogue State Tracking: Bridging the Gap	Apr 13, 2020	Dialogue State TrackingMachine Reading Comprehension	CodeCode Available	1
Fool Your (Vision and) Language Model With Embarrassingly Simple Permutations	Oct 2, 2023	In-Context LearningInstruction Following	CodeCode Available	1
Long Horizon Temperature Scaling	Feb 7, 2023	Multiple-choice	CodeCode Available	1
General-Purpose Question-Answering with Macaw	Sep 6, 2021	Generative Question AnsweringMultiple-choice	CodeCode Available	1
GPT Takes the Bar Exam	Dec 29, 2022	Hyperparameter OptimizationMultiple-choice	CodeCode Available	1
A Hitchhikers Guide to Fine-Grained Face Forgery Detection Using Common Sense Reasoning	Oct 1, 2024	Common Sense ReasoningDeepFake Detection	CodeCode Available	1
Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam Dataset	Jun 5, 2023	BenchmarkingMultiple-choice	CodeCode Available	1
Assessing the Chemical Intelligence of Large Language Models	May 12, 2025	Multiple-choice	CodeCode Available	1
BiMediX: Bilingual Medical Mixture of Experts LLM	Feb 20, 2024	Mixture-of-ExpertsMultiple-choice	CodeCode Available	1
FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture	Jun 16, 2024	DiversityMultiple-choice	CodeCode Available	1
MILU: A Multi-task Indic Language Understanding Benchmark	Nov 4, 2024	Multiple-choiceQuestion Answering	CodeCode Available	1
MindGames: Targeting Theory of Mind in Large Language Models with Dynamic Epistemic Modal Logic	May 5, 2023	Epistemic ReasoningLanguage Modeling	CodeCode Available	1
Fine-tuning Multimodal Large Language Models for Product Bundling	Jul 16, 2024	In-Context LearningMultiple-choice	CodeCode Available	1
Fake Alignment: Are LLMs Really Aligned Well?	Nov 10, 2023	Multiple-choice	CodeCode Available	1
FaceXBench: Evaluating Multimodal LLMs on Face Understanding	Jan 17, 2025	FairnessMultiple-choice	CodeCode Available	1
FarsTail: A Persian Natural Language Inference Dataset	Sep 18, 2020	Multiple-choiceNatural Language Inference	CodeCode Available	1
Explicit Planning Helps Language Models in Logical Reasoning	Mar 28, 2023	Logical ReasoningMultiple-choice	CodeCode Available	1
Ranked Voting based Self-Consistency of Large Language Models	May 16, 2025	Multiple-choiceOpen-Ended Question Answering	CodeCode Available	1
FETA: A Benchmark for Few-Sample Task Transfer in Open-Domain Dialogue	May 12, 2022	Dialogue UnderstandingDomain Adaptation	CodeCode Available	1
An Open Source Data Contamination Report for Large Language Models	Oct 26, 2023	HellaSwagLanguage Modeling	CodeCode Available	1
Annealed Winner-Takes-All for Motion Forecasting	Sep 17, 2024	AllAutonomous Driving	CodeCode Available	1
ExplaGraphs: An Explanation Graph Generation Task for Structured Commonsense Reasoning	Apr 15, 2021	Graph GenerationMultiple-choice	CodeCode Available	1
Annealed Multiple Choice Learning: Overcoming limitations of Winner-takes-all with annealing	Jul 22, 2024	AllDiversity	CodeCode Available	1
Evaluating the Knowledge Dependency of Questions	Nov 21, 2022	Multiple-choice	CodeCode Available	1
Explaining NLP Models via Minimal Contrastive Editing (MiCE)	Dec 27, 2020	counterfactualMultiple-choice	CodeCode Available	1
Filter-then-Generate: Large Language Models with Structure-Text Adapter for Knowledge Graph Completion	Dec 12, 2024	HallucinationKnowledge Graph Completion	CodeCode Available	1
HCQA @ Ego4D EgoSchema Challenge 2024	Jun 22, 2024	Caption Generation	CodeCode Available	1
An MRC Framework for Semantic Role Labeling	Sep 14, 2021	Computational EfficiencyMachine Reading Comprehension	CodeCode Available	1
African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object Classification	Jun 20, 2024	BenchmarkingClassification	CodeCode Available	1
Estimating Contamination via Perplexity: Quantifying Memorisation in Language Model Evaluation	Sep 19, 2023	Language Model EvaluationLanguage Modeling	CodeCode Available	1
An In-depth Look at Gemini's Language Abilities	Dec 18, 2023	Instruction FollowingMath	CodeCode Available	1
Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset and Comprehensive Framework	Jul 24, 2023	Contrastive LearningMultimodal Reasoning	CodeCode Available	1
Enhancing Knowledge Tracing with Concept Map and Response Disentanglement	Aug 23, 2024	DisentanglementKnowledge Tracing	CodeCode Available	1
Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission Exams	Mar 29, 2023	Multiple-choice	CodeCode Available	1
EduQG: A Multi-format Multiple Choice Dataset for the Educational Domain	Oct 12, 2022	Distractor GenerationMultiple-choice	CodeCode Available	1
Do Large Language Models Understand Conversational Implicature -- A case study with a chinese sitcom	Apr 30, 2024	ImplicaturesMultiple-choice	CodeCode Available	1

Show:10 25 50

← PrevPage 3 of 23Next →

No leaderboard results yet.