SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 461–470 of 5548 papers

Title	Date	Tasks	Status	Hype
Towards Motion Forecasting with Real-World Perception Inputs: Are End-to-End Approaches Competitive?	Jun 15, 2023	Autonomous DrivingAutonomous Vehicles	CodeCode Available	1
COCO: The Large Scale Black-Box Optimization Benchmarking (bbob-largescale) Test Suite	Mar 15, 2019	Benchmarking	CodeCode Available	1
CAVIAR: Co-simulation of 6G Communications, 3D Scenarios and AI for Digital Twins	Jan 6, 2024	Autonomous VehiclesBenchmarking	CodeCode Available	1
AnomalyHop: An SSL-based Image Anomaly Localization Method	May 8, 2021	Anomaly LocalizationBenchmarking	CodeCode Available	1
CausalTime: Realistically Generated Time-series for Benchmarking of Causal Discovery	Oct 3, 2023	BenchmarkingCausal Discovery	CodeCode Available	1
CODEMENV: Benchmarking Large Language Models on Code Migration	Jun 1, 2025	Benchmarking	CodeCode Available	1
CBench: Towards Better Evaluation of Question Answering Over Knowledge Graphs	Apr 5, 2021	BenchmarkingKnowledge Graphs	CodeCode Available	1
Chaos as an interpretable benchmark for forecasting and data-driven modelling	Oct 11, 2021	BenchmarkingSymbolic Regression	CodeCode Available	1
CodeUpdateArena: Benchmarking Knowledge Editing on API Updates	Jul 8, 2024	Benchmarkingknowledge editing	CodeCode Available	1
CheXphoto: 10,000+ Photos and Transformations of Chest X-rays for Benchmarking Deep Learning Robustness	Jul 13, 2020	Benchmarking	CodeCode Available	1

Show:10 25 50

← PrevPage 47 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified