SOTAVerified

Benchmarking

Papers

Showing 13611370 of 5548 papers

TitleStatusHype
ArabicaQA: A Comprehensive Dataset for Arabic Question AnsweringCode1
Benchmarking the Combinatorial Generalizability of Complex Query Answering on Knowledge GraphsCode1
Ineq-Comp: Benchmarking Human-Intuitive Compositional Reasoning in Automated Theorem Proving on InequalitiesCode1
RGB-D Indiscernible Object Counting in Underwater ScenesCode1
Benchmarking human visual search computational models in natural scenes: models comparison and reference datasetsCode1
OpenCIL: Benchmarking Out-of-Distribution Detection in Class-Incremental LearningCode1
IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language UnderstandingCode1
Benchmarking the Generation of Fact Checking ExplanationsCode1
IMP-MARL: a Suite of Environments for Large-scale Infrastructure Management Planning via MARLCode1
Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement LearningCode1
Show:102550
← PrevPage 137 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified