SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 261–270 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
PyGraft: Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your Fingertips	Sep 7, 2023	BenchmarkingKnowledge Graphs	CodeCode Available	2	5
DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering	Jul 15, 2025	BenchmarkingInstruction Following	CodeCode Available	2	5
LLM-Based Multi-Agent Systems are Scalable Graph Generative Models	Oct 13, 2024	BenchmarkingGraph Generation	CodeCode Available	2	5
AutoPenBench: Benchmarking Generative Agents for Penetration Testing	Oct 4, 2024	Benchmarking	CodeCode Available	2	5
MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly	May 15, 2025	8kBenchmarking	CodeCode Available	2	5
AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving	Dec 19, 2024	Autonomous DrivingBenchmarking	CodeCode Available	2	5
State-specific protein-ligand complex structure prediction with a multi-scale deep generative model	Sep 30, 2022	BenchmarkingBlind Docking	CodeCode Available	2	5
Event-Based Motion Magnification	Feb 19, 2024	BenchmarkingMotion Detection	CodeCode Available	2	5
GSCodec Studio: A Modular Framework for Gaussian Splat Compression	Jun 2, 2025	Benchmarking	CodeCode Available	2	5
Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer	Mar 21, 2025	BenchmarkingVideo Generation	CodeCode Available	2	5

Show:10 25 50

← PrevPage 27 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified