SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 211–220 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Fast Vision Transformers with HiLo Attention	May 26, 2022	BenchmarkingEfficient ViTs	CodeCode Available	2	5
HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance	Jul 9, 2024	BenchmarkingConditional Image Generation	CodeCode Available	2	5
HypoBench: Towards Systematic and Principled Benchmarking for Hypothesis Generation	Apr 15, 2025	Benchmarkingscientific discovery	CodeCode Available	2	5
EffiBench: Benchmarking the Efficiency of Automatically Generated Code	Feb 3, 2024	BenchmarkingCode Completion	CodeCode Available	2	5
A large annotated medical image dataset for the development and evaluation of segmentation algorithms	Feb 25, 2019	BenchmarkingSegmentation	CodeCode Available	2	5
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing	Apr 3, 2025	BenchmarkingLogical Reasoning	CodeCode Available	2	5
InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models	Oct 30, 2024	Benchmarking	CodeCode Available	2	5
InstructLayout: Instruction-Driven 2D and 3D Layout Synthesis with Semantic Graph Prior	Jul 10, 2024	BenchmarkingDecoder	CodeCode Available	2	5
Interactive Agents: Simulating Counselor-Client Psychological Counseling via Role-Playing LLM-to-LLM Interactions	Aug 28, 2024	Benchmarking	CodeCode Available	2	5
LLM-Based Multi-Agent Systems are Scalable Graph Generative Models	Oct 13, 2024	BenchmarkingGraph Generation	CodeCode Available	2	5

Show:10 25 50

← PrevPage 22 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified