SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 61–70 of 5548 papers

Title	Date	Tasks	Status	Hype
On the Robustness of Human-Object Interaction Detection against Distribution Shift	Jun 22, 2025	BenchmarkingData Augmentation	—Unverified	0
TAB: Unified Benchmarking of Time Series Anomaly Detection Methods	Jun 22, 2025	Anomaly DetectionBenchmarking	CodeCode Available	2
Leveling the Playing Field: Carefully Comparing Classical and Learned Controllers for Quadrotor Trajectory Tracking	Jun 21, 2025	BenchmarkingReinforcement Learning (RL)	—Unverified	0
ConsumerBench: Benchmarking Generative AI Applications on End-User Devices	Jun 21, 2025	BenchmarkingCPU	CodeCode Available	1
A Comparative Analysis of Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) as Dimensionality Reduction Techniques	Jun 20, 2025	BenchmarkingDimensionality Reduction	—Unverified	0
Universal Music Representations? Evaluating Foundation Models on World Music Corpora	Jun 20, 2025	BenchmarkingFew-Shot Learning	CodeCode Available	0
TabArena: A Living Benchmark for Machine Learning on Tabular Data	Jun 20, 2025	Benchmarking	CodeCode Available	3
InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech Systems	Jun 19, 2025	BenchmarkingDescriptive	CodeCode Available	1
Spotting tell-tale visual artifacts in face swapping videos: strengths and pitfalls of CNN detectors	Jun 19, 2025	BenchmarkingFace Swapping	—Unverified	0
OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents	Jun 19, 2025	Benchmarking	—Unverified	0

Show:10 25 50

← PrevPage 7 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified