SOTAVerified

Multiple-choice

Papers

Showing 1120 of 1107 papers

TitleStatusHype
Thunder-NUBench: A Benchmark for LLMs' Sentence-Level Negation Understanding0
Training-free LLM Merging for Multi-task LearningCode0
Instruction Tuning and CoT Prompting for Contextual Medical QA with LLMs0
Different Questions, Different Models: Fine-Grained Evaluation of Uncertainty and Calibration in Clinical QA with LLMs0
A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs0
VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks0
ARGUS: Hallucination and Omission Evaluation in Video-LLMs0
Evaluating LLM-corrupted Crowdsourcing Data Without Ground Truth0
STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous DrivingCode1
Multiple-Choice Question Generation Using Large Language Models: Methodology and Educator Insights0
Show:102550
← PrevPage 2 of 111Next →

No leaderboard results yet.