SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 391–400 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Learning to Fly -- a Gym Environment with PyBullet Physics for Reinforcement Learning of Multi-agent Quadcopter Control	Mar 3, 2021	BenchmarkingMulti-agent Reinforcement Learning	CodeCode Available	2	5
RoboPianist: Dexterous Piano Playing with Deep Reinforcement Learning	Apr 9, 2023	BenchmarkingDeep Reinforcement Learning	CodeCode Available	2	5
REAL-Colon: A dataset for developing real-world AI applications in colonoscopy	Mar 4, 2024	Benchmarking	CodeCode Available	2	5
Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph	Jun 21, 2024	BenchmarkingText Generation	CodeCode Available	2	5
BARS: Towards Open Benchmarking for Recommender Systems	May 19, 2022	BenchmarkingClick-Through Rate Prediction	CodeCode Available	2	5
Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach	Aug 31, 2019	ArticlesBenchmarking	CodeCode Available	2	5
COSMOS: Catching Out-of-Context Misinformation with Self-Supervised Learning	Jan 15, 2021	BenchmarkingMisinformation	CodeCode Available	1	5
Category-wise Fine-Tuning: Resisting Incorrect Pseudo-Labels in Multi-Label Image Classification with Partial Labels	Jan 30, 2024	Benchmarkingimage-classification	CodeCode Available	1	5
RADAR: Benchmarking Language Models on Imperfect Tabular Data	Jun 9, 2025	BenchmarkingMissing Values	CodeCode Available	1	5
Benchmarking Bias Mitigation Algorithms in Representation Learning through Fairness Metrics	Jun 8, 2021	Age And Gender ClassificationBenchmarking	CodeCode Available	1	5

Show:10 25 50

← PrevPage 40 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified