SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1821–1830 of 5548 papers

Title	Date	Tasks	Status	Hype
STEP: A Unified Spiking Transformer Evaluation Platform for Fair and Reproducible Benchmarking	May 16, 2025	Benchmarking	CodeCode Available	0
CleanPatrick: A Benchmark for Image Data Cleaning	May 16, 2025	BenchmarkingLabel Error Detection	CodeCode Available	0
Visual Anomaly Detection under Complex View-Illumination Interplay: A Large-Scale Benchmark	May 16, 2025	Anomaly DetectionBenchmarking	—Unverified	0
Relation Extraction Across Entire Books to Reconstruct Community Networks: The AffilKG Datasets	May 16, 2025	BenchmarkingKnowledge Graphs	—Unverified	0
Benchmarking performance, explainability, and evaluation strategies of vision-language models for surgery: Challenges and opportunities	May 16, 2025	Benchmarking	—Unverified	0
Benchmarking Spatiotemporal Reasoning in LLMs and Reasoning Models: Capabilities and Challenges	May 16, 2025	BenchmarkingState Estimation	CodeCode Available	0
TCC-Bench: Benchmarking the Traditional Chinese Culture Understanding Capabilities of MLLMs	May 16, 2025	BenchmarkingQuestion Answering	CodeCode Available	0
MedGUIDE: Benchmarking Clinical Decision-Making in Large Language Models	May 16, 2025	BenchmarkingDecision Making	—Unverified	0
ASR-FAIRBENCH: Measuring and Benchmarking Equity Across Speech Recognition Systems	May 16, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
JointDistill: Adaptive Multi-Task Distillation for Joint Depth Estimation and Scene Segmentation	May 15, 2025	BenchmarkingDepth Estimation	—Unverified	0

Show:10 25 50

← PrevPage 183 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified