Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 301–325 of 1107 papers

Title	Date	Tasks	Status
Explore then Determine: A GNN-LLM Synergy Framework for Reasoning over Knowledge Graph	Jun 3, 2024	Knowledge GraphsMultiple-choice	—Unverified
Exposing the Limits of Video-Text Models through Contrast Sets	Jan 16, 2022	Language ModelingLanguage Modelling	—Unverified
Cleared for Takeoff? Compositional & Conditional Reasoning may be the Achilles Heel to (Flight-Booking) Language Agents	Apr 5, 2024	Multiple-choiceNavigate	—Unverified
CinePile: A Long Video Question Answering Dataset and Benchmark	May 14, 2024	FormHuman-Object Interaction Detection	—Unverified
ARGUS: Hallucination and Omission Evaluation in Video-LLMs	Jun 9, 2025	DescriptiveForm	—Unverified
Chat-TS: Enhancing Multi-Modal Reasoning Over Time-Series and Natural Language Data	Mar 13, 2025	Large Language ModelMath	—Unverified
Are You Doubtful? Oh, It Might Be Difficult Then! Exploring the Use of Model Uncertainty for Question Difficulty Estimation	Dec 16, 2024	Multiple-choice	—Unverified
AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning	May 16, 2024	Multiple-choiceQuestion Answering	—Unverified
A review of faithfulness metrics for hallucination assessment in Large Language Models	Dec 31, 2024	BenchmarkingHallucination	—Unverified
Adaptive Wizard for Removing Cross-Tier Misconfigurations in Active Directory	May 2, 2025	Multiple-choice	—Unverified
Characterizing Large Language Models as Rationalizers of Knowledge-intensive Tasks	Nov 9, 2023	Multiple-choiceWorld Knowledge	—Unverified
Changing Answer Order Can Decrease MMLU Accuracy	Jun 27, 2024	MMLUMultiple-choice	—Unverified
Evaluating the Symbol Binding Ability of Large Language Models for Multiple-Choice Questions in Vietnamese General Education	Oct 18, 2023	Multiple-choiceMultiple Choice Question Answering (MCQA)	—Unverified
Evaluating Vision-Language and Large Language Models for Automated Student Assessment in Indonesian Classrooms	Jun 5, 2025	Multiple-choice	—Unverified
Are LLM-generated plain language summaries truly understandable? A large-scale crowdsourced evaluation	May 15, 2025	InformativenessMultiple-choice	—Unverified
CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding	Dec 16, 2024	HallucinationMultiple-choice	—Unverified
Adaptive Crowdsourcing Algorithms for the Bandit Survey Problem	Feb 13, 2013	Information RetrievalMultiple-choice	—Unverified
CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language Models	Jul 2, 2024	Multiple-choice	—Unverified
Evaluating the Potential of Leading Large Language Models in Reasoning Biology Questions	Nov 5, 2023	Logical ReasoningMultiple-choice	—Unverified
CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy	Oct 17, 2024	Multiple-choiceResponse Generation	—Unverified
A recent evaluation on the performance of LLMs on radiation oncology physics using questions of randomly shuffled options	Dec 14, 2024	Multiple-choice	—Unverified
AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models	Jun 13, 2024	Multiple-choice	—Unverified
AraTrust: An Evaluation of Trustworthiness for LLMs in Arabic	Mar 14, 2024	EthicsMultiple-choice	—Unverified
Recent Advances in Multi-Choice Machine Reading Comprehension: A Survey on Methods and Datasets	Aug 4, 2024	Few-Shot LearningMachine Reading Comprehension	—Unverified
Evaluating the Performance and Robustness of LLMs in Materials Science Q&A and Property Predictions	Sep 22, 2024	Band GapIn-Context Learning	—Unverified

Show:10 25 50

← PrevPage 13 of 45Next →

No leaderboard results yet.