SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2171–2180 of 5548 papers

Title	Date	Tasks	Status	Hype
A Benchmarking Environment for Reinforcement Learning Based Task Oriented Dialogue Management	Nov 29, 2017	BenchmarkingDeep Reinforcement Learning	—Unverified	0
Breakpoint: Scalable evaluation of system-level reasoning in LLM code agents	May 30, 2025	BenchmarkingCode Repair	—Unverified	0
A new pathway to generative artificial intelligence by minimizing the maximum entropy	Feb 18, 2025	Benchmarking	—Unverified	0
Exploring the Impact of a Transformer's Latent Space Geometry on Downstream Task Performance	Jun 18, 2024	Benchmarking	—Unverified	0
BraTS-Path Challenge: Assessing Heterogeneous Histopathologic Brain Tumor Sub-regions	May 17, 2024	BenchmarkingPrognosis	—Unverified	0
Adaptive Gradient Methods with Local Guarantees	Mar 2, 2022	Benchmarking	—Unverified	0
Object Pose Estimation in Robotics Revisited	Jun 6, 2019	3D Pose Estimation6D Pose Estimation	—Unverified	0
BOX3D: Lightweight Camera-LiDAR Fusion for 3D Object Detection and Localization	Aug 27, 2024	3D Object DetectionBenchmarking	—Unverified	0
Scale MLPerf-0.6 models on Google TPU-v3 Pods	Sep 21, 2019	Benchmarking	—Unverified	0
Boundary Detection Benchmarking: Beyond F-Measures	Jun 1, 2013	BenchmarkingBoundary Detection	—Unverified	0

Show:10 25 50

← PrevPage 218 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified