SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1741–1750 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM	Oct 8, 2014	Benchmarking	CodeCode Available	0	5
Air Learning: A Deep Reinforcement Learning Gym for Autonomous Aerial Robot Visual Navigation	Jun 2, 2019	BenchmarkingDeep Reinforcement Learning	CodeCode Available	0	5
Can a single neuron learn predictive uncertainty?	Jun 7, 2021	BenchmarkingConformal Prediction	CodeCode Available	0	5
Can AI Validate Science? Benchmarking LLMs for Accurate Scientific Claim Evidence Reasoning	Jun 9, 2025	BenchmarkingDiagnostic	CodeCode Available	0	5
Integration of nested cross-validation, automated hyperparameter optimization, high-performance computing to reduce and quantify the variance of test performance estimation of deep learning models	Mar 11, 2025	BenchmarkingHyperparameter Optimization	CodeCode Available	0	5
Integrating Expert Knowledge into Logical Programs via LLMs	Feb 17, 2025	BenchmarkingLogical Reasoning	CodeCode Available	0	5
JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models	Jun 10, 2024	BenchmarkingCode Generation	CodeCode Available	0	5
Analyzing the Feature Extractor Networks for Face Image Synthesis	Jun 4, 2024	BenchmarkingImage Generation	CodeCode Available	0	5
InstaIndoor and Multi-modal Deep Learning for Indoor Scene Recognition	Dec 23, 2021	BenchmarkingDeep Learning	CodeCode Available	0	5
Benchmarking Multi-dimensional AIGC Video Quality Assessment: A Dataset and Unified Model	Jul 31, 2024	BenchmarkingLarge Language Model	CodeCode Available	0	5

Show:10 25 50

← PrevPage 175 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified