SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 291–300 of 5548 papers

Title	Date	Tasks	Status	Hype
So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection	May 24, 2025	BenchmarkingImage Forgery Detection	—Unverified	0
MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation	May 23, 2025	Audio GenerationBenchmarking	—Unverified	0
Benchmark for Antibody Binding Affinity Maturation and Design	May 23, 2025	Benchmarking	—Unverified	0
U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding	May 23, 2025	BenchmarkingSpatial Reasoning	—Unverified	0
3D Face Reconstruction Error Decomposed: A Modular Benchmark for Fair and Fast Method Evaluation	May 23, 2025	3D Face ReconstructionBenchmarking	CodeCode Available	0
A Position Paper on the Automatic Generation of Machine Learning Leaderboards	May 23, 2025	BenchmarkingPosition	CodeCode Available	0
SemSegBench & DetecBench: Benchmarking Reliability and Generalization Beyond Classification	May 23, 2025	BenchmarkingClassification	CodeCode Available	0
PawPrint: Whose Footprints Are These? Identifying Animal Individuals by Their Footprints	May 23, 2025	Benchmarking	—Unverified	0
PerMedCQA: Benchmarking Large Language Models on Medical Consumer Question Answering in Persian Language	May 23, 2025	BenchmarkingQuestion Answering	—Unverified	0
FullFront: Benchmarking MLLMs Across the Full Front-End Engineering Workflow	May 23, 2025	BenchmarkingCode Generation	CodeCode Available	1

Show:10 25 50

← PrevPage 30 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified