SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 121–130 of 5548 papers

Title	Date	Tasks	Status	Hype
Solving excited states for long-range interacting trapped ions with neural networks	Jun 10, 2025	Benchmarking	—Unverified	0
Benchmarking Foundation Speech and Language Models for Alzheimer's Disease and Related Dementia Detection from Spontaneous Speech	Jun 9, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
The Catechol Benchmark: Time-series Solvent Selection Data for Few-shot Machine Learning	Jun 9, 2025	Active LearningBenchmarking	CodeCode Available	0
Generative Models at the Frontier of Compression: A Survey on Generative Face Video Coding	Jun 9, 2025	BenchmarkingVideo Compression	—Unverified	0
REMoH: A Reflective Evolution of Multi-objective Heuristics approach via Large Language Models	Jun 9, 2025	BenchmarkingDecision Making	—Unverified	0
SurgBench: A Unified Large-Scale Benchmark for Surgical Video Analysis	Jun 9, 2025	Action ClassificationBenchmarking	—Unverified	0
HuSc3D: Human Sculpture dataset for 3D object reconstruction	Jun 9, 2025	3D Object Reconstruction3D Reconstruction	CodeCode Available	0
RADAR: Benchmarking Language Models on Imperfect Tabular Data	Jun 9, 2025	BenchmarkingMissing Values	CodeCode Available	1
Ensuring Reliability of Curated EHR-Derived Data: The Validation of Accuracy for LLM/ML-Extracted Information and Data (VALID) Framework	Jun 9, 2025	BenchmarkingFairness	—Unverified	0
SOP-Bench: Complex Industrial SOPs for Evaluating LLM Agents	Jun 9, 2025	BenchmarkingSynthetic Data Generation	—Unverified	0

Show:10 25 50

← PrevPage 13 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified