SOTAVerified

Multiple-choice

Papers

Showing 2130 of 1107 papers

TitleStatusHype
PCToolkit: A Unified Plug-and-Play Prompt Compression Toolkit of Large Language ModelsCode3
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual ComprehensionCode3
MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language ModelsCode3
Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language ModelsCode2
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language ModelsCode2
HourVideo: 1-Hour Video-Language UnderstandingCode2
FinEval: A Chinese Financial Domain Knowledge Evaluation Benchmark for Large Language ModelsCode2
All in One: Exploring Unified Video-Language Pre-trainingCode2
Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam GenerationCode2
Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1Code2
Show:102550
← PrevPage 3 of 111Next →

No leaderboard results yet.