Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 301–350 of 1107 papers

Title	Date	Tasks	Status
First Token Probability Guided RAG for Telecom Question Answering	Jan 11, 2025	Multiple-choiceMultiple Choice Question Answering (MCQA)	—Unverified
From Generalist to Specialist: Improving Large Language Models for Medical Physics Using ARCoT	May 17, 2024	BenchmarkingMultiple-choice	—Unverified
Cleared for Takeoff? Compositional & Conditional Reasoning may be the Achilles Heel to (Flight-Booking) Language Agents	Apr 5, 2024	Multiple-choiceNavigate	—Unverified
CinePile: A Long Video Question Answering Dataset and Benchmark	May 14, 2024	FormHuman-Object Interaction Detection	—Unverified
ARGUS: Hallucination and Omission Evaluation in Video-LLMs	Jun 9, 2025	DescriptiveForm	—Unverified
Chat-TS: Enhancing Multi-Modal Reasoning Over Time-Series and Natural Language Data	Mar 13, 2025	Large Language ModelMath	—Unverified
Are You Doubtful? Oh, It Might Be Difficult Then! Exploring the Use of Model Uncertainty for Question Difficulty Estimation	Dec 16, 2024	Multiple-choice	—Unverified
AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning	May 16, 2024	Multiple-choiceQuestion Answering	—Unverified
A review of faithfulness metrics for hallucination assessment in Large Language Models	Dec 31, 2024	BenchmarkingHallucination	—Unverified
Adaptive Wizard for Removing Cross-Tier Misconfigurations in Active Directory	May 2, 2025	Multiple-choice	—Unverified
Characterizing Large Language Models as Rationalizers of Knowledge-intensive Tasks	Nov 9, 2023	Multiple-choiceWorld Knowledge	—Unverified
Changing Answer Order Can Decrease MMLU Accuracy	Jun 27, 2024	MMLUMultiple-choice	—Unverified
Fill-in-the-Blank: A Challenging Video Understanding Evaluation Framework	Nov 16, 2021	Multiple-choiceQuestion Answering	—Unverified
Are LLM-generated plain language summaries truly understandable? A large-scale crowdsourced evaluation	May 15, 2025	InformativenessMultiple-choice	—Unverified
CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding	Dec 16, 2024	HallucinationMultiple-choice	—Unverified
Adaptive Crowdsourcing Algorithms for the Bandit Survey Problem	Feb 13, 2013	Information RetrievalMultiple-choice	—Unverified
CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language Models	Jul 2, 2024	Multiple-choice	—Unverified
Few-Shot Image Classification and Segmentation as Visual Question Answering Using Vision-Language Models	Mar 15, 2024	Few-Shot Image Classificationimage-classification	—Unverified
CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy	Oct 17, 2024	Multiple-choiceResponse Generation	—Unverified
A recent evaluation on the performance of LLMs on radiation oncology physics using questions of randomly shuffled options	Dec 14, 2024	Multiple-choice	—Unverified
AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models	Jun 13, 2024	Multiple-choice	—Unverified
AraTrust: An Evaluation of Trustworthiness for LLMs in Arabic	Mar 14, 2024	EthicsMultiple-choice	—Unverified
Recent Advances in Multi-Choice Machine Reading Comprehension: A Survey on Methods and Datasets	Aug 4, 2024	Few-Shot LearningMachine Reading Comprehension	—Unverified
Field-testing items using artificial intelligence: Natural language processing with transformers	Oct 18, 2023	Multiple-choice	—Unverified
Fine-tuning BERT with Focus Words for Explanation Regeneration	Dec 1, 2020	Explanation GenerationMultiple-choice	—Unverified
Can We Trust LLMs? Mitigate Overconfidence Bias in LLMs through Knowledge Transfer	May 27, 2024	Multiple-choiceSentiment Analysis	—Unverified
AraSTEM: A Native Arabic Multiple Choice Question Benchmark for Evaluating LLMs Knowledge In STEM Subjects	Dec 31, 2024	BenchmarkingMultiple-choice	—Unverified
FAMULUS: Interactive Annotation and Feedback Generation for Teaching Diagnostic Reasoning	Aug 29, 2019	DiagnosticMultiple-choice	—Unverified
Can Multimodal LLMs do Visual Temporal Understanding and Reasoning? The answer is No!	Jan 18, 2025	Multiple-choiceQuestion Answering	—Unverified
GeoSQA: A Benchmark for Scenario-based Question Answering in the Geography Domain at High School Level	Aug 20, 2019	General KnowledgeMultiple-choice	—Unverified
FarsEval-PKBETS: A new diverse benchmark for evaluating Persian large language models	Apr 20, 2025	DescriptiveEthics	—Unverified
Aqulia-Med LLM: Pioneering Full-Process Open-Source Medical Language Models	Jun 18, 2024	Multiple-choice	—Unverified
FactTest: Factuality Testing in Large Language Models with Finite-Sample and Distribution-Free Guarantees	Nov 4, 2024	Multiple-choiceQuestion Answering	—Unverified
Can Generative Pre-trained Transformers (GPT) Pass Assessments in Higher Education Programming Courses?	Mar 16, 2023	Multiple-choice	—Unverified
Can Crowdsourcing be used for Effective Annotation of Arabic?	May 1, 2014	Entity ResolutionMultiple-choice	—Unverified
Can ChatGPT pass the Vietnamese National High School Graduation Examination?	Jun 15, 2023	Language ModelingLanguage Modelling	—Unverified
Applying IRT to Distinguish Between Human and Generative AI Responses to Multiple-Choice Assessments	Nov 28, 2024	Multiple-choice	—Unverified
Enhancing lexical-based approach with external knowledge for Vietnamese multiple-choice machine reading comprehension	Jan 16, 2020	Machine Reading ComprehensionMultiple-choice	—Unverified
Can AI Master Construction Management (CM)? Benchmarking State-of-the-Art Large Language Models on CM Certification Exams	Apr 4, 2025	BenchmarkingManagement	—Unverified
A Joint-Reasoning based Disease Q&A System	Jan 6, 2024	Knowledge GraphsMisinformation	—Unverified
How Additional Knowledge can Improve Natural Language Commonsense Question Answering?	Sep 19, 2019	ArticlesLanguage Modeling	—Unverified
Analysis of the Cambridge Multiple-Choice Questions Reading Dataset with a Focus on Candidate Response Distribution	Jun 22, 2023	Multiple-choice	—Unverified
Answer Uncertainty and Unanswerability in Multiple-Choice Machine Reading Comprehension	May 1, 2022	Machine Reading ComprehensionMultiple-choice	—Unverified
Answer Uncertainty and Unanswerability in Multiple-Choice Machine Reading Comprehension	Jan 16, 2022	Machine Reading ComprehensionMultiple-choice	—Unverified
Bridging the Language Gap: Knowledge Injected Multilingual Question Answering	Apr 6, 2023	Cross-Lingual TransferExtractive Question-Answering	—Unverified
Bridging Information-Seeking Human Gaze and Machine Reading Comprehension	Sep 30, 2020	Machine Reading ComprehensionMultiple-choice	—Unverified
Adapting Vision-Language Models for Evaluating World Models	Jun 22, 2025	Action RecognitionMultimodal Reasoning	—Unverified
Exposing the Limits of Video-Text Models through Contrast Sets	Jan 16, 2022	Language ModelingLanguage Modelling	—Unverified
FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding	Mar 19, 2025	BenchmarkingMultiple-choice	—Unverified
SHARP: Unlocking Interactive Hallucination via Stance Transfer in Role-Playing Agents	Nov 12, 2024	General KnowledgeHallucination	—Unverified

Show:10 25 50

← PrevPage 7 of 23Next →

No leaderboard results yet.