SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1741–1750 of 5548 papers

Title	Date	Tasks	Status	Hype
3D Face Reconstruction Error Decomposed: A Modular Benchmark for Fair and Fast Method Evaluation	May 23, 2025	3D Face ReconstructionBenchmarking	CodeCode Available	0
Is Single-View Mesh Reconstruction Ready for Robotics?	May 23, 2025	3D ReconstructionBenchmarking	—Unverified	0
JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models	May 23, 2025	BenchmarkingDiversity	CodeCode Available	0
PawPrint: Whose Footprints Are These? Identifying Animal Individuals by Their Footprints	May 23, 2025	Benchmarking	—Unverified	0
Benchmarking Expressive Japanese Character Text-to-Speech with VITS and Style-BERT-VITS2	May 22, 2025	BenchmarkingDialogue Generation	—Unverified	0
Experimental robustness benchmark of quantum neural network on a superconducting quantum processor	May 22, 2025	Adversarial AttackAdversarial Robustness	—Unverified	0
Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models	May 22, 2025	BenchmarkingLanguage Modeling	—Unverified	0
Edge-First Language Model Inference: Models, Metrics, and Tradeoffs	May 22, 2025	BenchmarkingLanguage Modeling	—Unverified	0
KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models	May 22, 2025	BenchmarkingDiagnostic	—Unverified	0
Benchmarking and Pushing the Multi-Bias Elimination Boundary of LLMs via Causal Effect Estimation-guided Debiasing	May 22, 2025	Benchmarking	—Unverified	0

Show:10 25 50

← PrevPage 175 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified