SOTAVerified

Benchmarking

Papers

Showing 37713780 of 5548 papers

TitleStatusHype
TASKOGRAPHY: Evaluating robot task planning over large 3D scene graphsCode1
Graph Generative Model for Benchmarking Graph Neural NetworksCode1
A novel evaluation methodology for supervised Feature Ranking algorithmsCode0
Ensemble random forest filter: An alternative to the ensemble Kalman filter for inverse modeling0
OVQA: A Clinically Generated Visual Question Answering Dataset0
VMAS: A Vectorized Multi-Agent Simulator for Collective Robot LearningCode2
Benefits and Challenges of Dynamic Modelling of Cascading Failures in Power Systems0
Understanding Performance of Long-Document Ranking Models through Comprehensive Evaluation and LeaderboardingCode2
Identifying the Context Shift between Test Benchmarks and Production Data0
Can Language Models Make Fun? A Case Study in Chinese Comical CrosstalkCode1
Show:102550
← PrevPage 378 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified