SOTAVerified

Multiple-choice

Papers

Showing 7180 of 1107 papers

TitleStatusHype
All in One: Exploring Unified Video-Language Pre-trainingCode2
What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical ExamsCode2
STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous DrivingCode1
LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician ValidationCode1
Polishing Every Facet of the GEM: Testing Linguistic Competence of LLMs and Humans in KoreanCode1
Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across ModalitiesCode1
Let Androids Dream of Electric Sheep: A Human-like Image Implication Understanding and Reasoning FrameworkCode1
LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images?Code1
GIE-Bench: Towards Grounded Evaluation for Text-Guided Image EditingCode1
Ranked Voting based Self-Consistency of Large Language ModelsCode1
Show:102550
← PrevPage 8 of 111Next →

No leaderboard results yet.