SOTAVerified

Multiple-choice

Papers

Showing 2130 of 1107 papers

TitleStatusHype
PCToolkit: A Unified Plug-and-Play Prompt Compression Toolkit of Large Language ModelsCode3
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual ComprehensionCode3
MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language ModelsCode3
Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language ModelsCode2
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language ModelsCode2
HourVideo: 1-Hour Video-Language UnderstandingCode2
FinEval: A Chinese Financial Domain Knowledge Evaluation Benchmark for Large Language ModelsCode2
EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement LearningCode2
Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam GenerationCode2
CyberMetric: A Benchmark Dataset based on Retrieval-Augmented Generation for Evaluating LLMs in Cybersecurity KnowledgeCode2
Show:102550
← PrevPage 3 of 111Next →

No leaderboard results yet.