SOTAVerified

Benchmarking

Papers

Showing 391400 of 5548 papers

TitleStatusHype
Learning to Fly -- a Gym Environment with PyBullet Physics for Reinforcement Learning of Multi-agent Quadcopter ControlCode2
RoboPianist: Dexterous Piano Playing with Deep Reinforcement LearningCode2
REAL-Colon: A dataset for developing real-world AI applications in colonoscopyCode2
Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-PolygraphCode2
BARS: Towards Open Benchmarking for Recommender SystemsCode2
Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment ApproachCode2
COSMOS: Catching Out-of-Context Misinformation with Self-Supervised LearningCode1
Category-wise Fine-Tuning: Resisting Incorrect Pseudo-Labels in Multi-Label Image Classification with Partial LabelsCode1
RADAR: Benchmarking Language Models on Imperfect Tabular DataCode1
Benchmarking Bias Mitigation Algorithms in Representation Learning through Fairness MetricsCode1
Show:102550
← PrevPage 40 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified