SOTAVerified

Multiple-choice

Papers

Showing 531540 of 1107 papers

TitleStatusHype
MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models0
Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models0
TVBench: Redesigning Video-Language Evaluation0
Answering Questions in Stages: Prompt Chaining for Contract QA0
Utilize the Flow before Stepping into the Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction TuningCode0
ActionAtlas: A VideoQA Benchmark for Domain-specialized Action Recognition0
ACPBench: Reasoning about Action, Change, and Planning0
Plausibly Problematic Questions in Multiple-Choice Benchmarks for Commonsense ReasoningCode0
Video Instruction Tuning With Synthetic Data0
Listening to the Wise Few: Select-and-Copy Attention Heads for Multiple-Choice QA0
Show:102550
← PrevPage 54 of 111Next →

No leaderboard results yet.