SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1551–1560 of 5548 papers

Title	Date	Tasks	Status	Hype
Statistical Multicriteria Evaluation of LLM-Generated Text	Jun 22, 2025	BenchmarkingDiversity	CodeCode Available	0
Leveling the Playing Field: Carefully Comparing Classical and Learned Controllers for Quadrotor Trajectory Tracking	Jun 21, 2025	BenchmarkingReinforcement Learning (RL)	—Unverified	0
Universal Music Representations? Evaluating Foundation Models on World Music Corpora	Jun 20, 2025	BenchmarkingFew-Shot Learning	CodeCode Available	0
A Comparative Analysis of Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) as Dimensionality Reduction Techniques	Jun 20, 2025	BenchmarkingDimensionality Reduction	—Unverified	0
OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents	Jun 19, 2025	Benchmarking	—Unverified	0
Spotting tell-tale visual artifacts in face swapping videos: strengths and pitfalls of CNN detectors	Jun 19, 2025	BenchmarkingFace Swapping	—Unverified	0
Finance Language Model Evaluation (FLaME)	Jun 18, 2025	BenchmarkingLanguage Model Evaluation	—Unverified	0
PGLib-CO2: A Power Grid Library for Computing and Optimizing Carbon Emissions	Jun 17, 2025	Benchmarking	—Unverified	0
A large-scale heterogeneous 3D magnetic resonance brain imaging dataset for self-supervised learning	Jun 17, 2025	BenchmarkingSelf-Supervised Learning	—Unverified	0
ImpliRet: Benchmarking the Implicit Fact Retrieval Challenge	Jun 17, 2025	BenchmarkingRetrieval	CodeCode Available	0

Show:10 25 50

← PrevPage 156 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified