SOTAVerified

Multiple-choice

Papers

Showing 241250 of 1107 papers

TitleStatusHype
TSQA: Tabular Scenario Based Question AnsweringCode1
TUMTraffic-VideoQA: A Benchmark for Unified Spatio-Temporal Video Understanding in Traffic ScenesCode1
Counterfactual Variable Control for Robust and Interpretable Question AnsweringCode1
Uncertainty is Fragile: Manipulating Uncertainty in Large Language ModelsCode1
Complex Reasoning over Logical Queries on Commonsense Knowledge GraphsCode1
Assessing the Chemical Intelligence of Large Language ModelsCode1
Unsupervised Commonsense Question Answering with Self-TalkCode1
Conformal Prediction with Large Language Models for Multi-Choice Question AnsweringCode1
Do Large Language Models Understand Conversational Implicature -- A case study with a chinese sitcomCode1
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language UnderstandingCode1
Show:102550
← PrevPage 25 of 111Next →

No leaderboard results yet.