Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 76–100 of 1107 papers

Title	Date	Tasks	Status	Hype
Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across Modalities	May 23, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	1
Let Androids Dream of Electric Sheep: A Human-like Image Implication Understanding and Reasoning Framework	May 22, 2025	Multiple-choiceVisual Question Answering (VQA)	CodeCode Available	1
LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images?	May 18, 2025	Logical ReasoningMultimodal Reasoning	CodeCode Available	1
GIE-Bench: Towards Grounded Evaluation for Text-Guided Image Editing	May 16, 2025	Instruction FollowingMultiple-choice	CodeCode Available	1
Ranked Voting based Self-Consistency of Large Language Models	May 16, 2025	Multiple-choiceOpen-Ended Question Answering	CodeCode Available	1
IRLBench: A Multi-modal, Culturally Grounded, Parallel Irish-English Benchmark for Open-Ended LLM Reasoning Evaluation	May 16, 2025	Multiple-choice	CodeCode Available	1
Benchmarking AI scientists in omics data-driven biological research	May 13, 2025	BenchmarkingMultiple-choice	CodeCode Available	1
Assessing the Chemical Intelligence of Large Language Models	May 12, 2025	Multiple-choice	CodeCode Available	1
ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question Answering	Apr 7, 2025	Chart Question AnsweringChart Understanding	CodeCode Available	1
Mobile-MMLU: A Mobile Intelligence Language Understanding Benchmark	Mar 26, 2025	MMLUMultiple-choice	CodeCode Available	1
Language Model Uncertainty Quantification with Attention Chain	Mar 24, 2025	Computational EfficiencyLanguage Modeling	CodeCode Available	1
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models	Mar 20, 2025	Multiple-choiceVideo Understanding	CodeCode Available	1
MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research	Mar 17, 2025	ArticlesBenchmarking	CodeCode Available	1
CUPCase: Clinically Uncommon Patient Cases and Diagnoses Dataset	Mar 8, 2025	Multiple-choice	CodeCode Available	1
AutoLogi: Automated Generation of Logic Puzzles for Evaluating Reasoning Abilities of Large Language Models	Feb 24, 2025	Logical ReasoningMultiple-choice	CodeCode Available	1
Mind the Confidence Gap: Overconfidence, Calibration, and Distractor Effects in Large Language Models	Feb 16, 2025	Multiple-choice	CodeCode Available	1
TUMTraffic-VideoQA: A Benchmark for Unified Spatio-Temporal Video Understanding in Traffic Scenes	Feb 4, 2025	Autonomous DrivingMultiple-choice	CodeCode Available	1
FaceXBench: Evaluating Multimodal LLMs on Face Understanding	Jan 17, 2025	FairnessMultiple-choice	CodeCode Available	1
ToMATO: Verbalizing the Mental States of Role-Playing LLMs for Benchmarking Theory of Mind	Jan 15, 2025	BenchmarkingMultiple-choice	CodeCode Available	1
ZNO-Eval: Benchmarking reasoning capabilities of large language models in Ukrainian	Jan 12, 2025	BenchmarkingMath	CodeCode Available	1
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation	Jan 6, 2025	Language Model EvaluationLanguage Modeling	CodeCode Available	1
Unifying Specialized Visual Encoders for Video Language Models	Jan 2, 2025	Multiple-choiceVideo Understanding	CodeCode Available	1
Filter-then-Generate: Large Language Models with Structure-Text Adapter for Knowledge Graph Completion	Dec 12, 2024	HallucinationKnowledge Graph Completion	CodeCode Available	1
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?	Dec 3, 2024	Multiple-choice	CodeCode Available	1
SailCompass: Towards Reproducible and Robust Evaluation for Southeast Asian Languages	Dec 2, 2024	Multiple-choice	CodeCode Available	1

Show:10 25 50

← PrevPage 4 of 45Next →

No leaderboard results yet.