SOTAVerified

Multiple-choice

Papers

Showing 191200 of 1107 papers

TitleStatusHype
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language UnderstandingCode1
CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and ReasoningCode1
MindGames: Targeting Theory of Mind in Large Language Models with Dynamic Epistemic Modal LogicCode1
Mind Reasoning Manners: Enhancing Type Perception for Generalized Zero-shot Logical Reasoning over TextCode1
Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset and Comprehensive FrameworkCode1
A BERT-based Distractor Generation Scheme with Multi-tasking and Negative Answer Training Strategies.Code1
AutoLogi: Automated Generation of Logic Puzzles for Evaluating Reasoning Abilities of Large Language ModelsCode1
E-EVAL: A Comprehensive Chinese K-12 Education Evaluation Benchmark for Large Language ModelsCode1
Enhancing Knowledge Tracing with Concept Map and Response DisentanglementCode1
ExplaGraphs: An Explanation Graph Generation Task for Structured Commonsense ReasoningCode1
Show:102550
← PrevPage 20 of 111Next →

No leaderboard results yet.