SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1421–1430 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Autonomous Microscopy Experiments through Large Language Model Agents	Dec 18, 2024	BenchmarkingExperimental Design	CodeCode Available	1	5
EndoSLAM Dataset and An Unsupervised Monocular Visual Odometry and Depth Estimation Approach for Endoscopic Videos: Endo-SfMLearner	Jun 30, 2020	BenchmarkingDepth Estimation	CodeCode Available	1	5
IOHanalyzer: Detailed Performance Analyses for Iterative Optimization Heuristics	Jul 8, 2020	Bayesian OptimizationBenchmarking	CodeCode Available	1	5
RAD: A Comprehensive Dataset for Benchmarking the Robustness of Image Anomaly Detection	Jun 11, 2024	Anomaly DetectionBenchmarking	CodeCode Available	1	5
Is Multi-Hop Reasoning Really Explainable? Towards Benchmarking Reasoning Interpretability	Apr 14, 2021	BenchmarkingLink Prediction	CodeCode Available	1	5
Enhancing Biomedical Relation Extraction with Directionality	Jan 23, 2025	BenchmarkingDocument-level Relation Extraction	CodeCode Available	1	5
JuDGE: Benchmarking Judgment Document Generation for Chinese Legal System	Mar 18, 2025	BenchmarkingIn-Context Learning	CodeCode Available	1	5
Benchmarking Vision, Language, & Action Models on Robotic Learning Tasks	Nov 4, 2024	Action GenerationBenchmarking	CodeCode Available	1	5
Benchpress: A Scalable and Versatile Workflow for Benchmarking Structure Learning Algorithms	Jul 8, 2021	Benchmarking	CodeCode Available	1	5
BEND: Benchmarking DNA Language Models on biologically meaningful tasks	Nov 21, 2023	BenchmarkingLanguage Modeling	CodeCode Available	1	5

Show:10 25 50

← PrevPage 143 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified