SOTAVerified

Multiple-choice

Papers

Showing 161170 of 1107 papers

TitleStatusHype
Estimating Contamination via Perplexity: Quantifying Memorisation in Language Model EvaluationCode1
LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language ModelsCode1
Annealed Winner-Takes-All for Motion ForecastingCode1
FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food CultureCode1
An Open Source Data Contamination Report for Large Language ModelsCode1
From Machine Reading Comprehension to Dialogue State Tracking: Bridging the GapCode1
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language UnderstandingCode1
E-EVAL: A Comprehensive Chinese K-12 Education Evaluation Benchmark for Large Language ModelsCode1
Fine-tuning Multimodal Large Language Models for Product BundlingCode1
Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset and Comprehensive FrameworkCode1
Show:102550
← PrevPage 17 of 111Next →

No leaderboard results yet.