SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 421–430 of 5548 papers

Title	Date	Tasks	Status	Hype
Draw ALL Your Imagine: A Holistic Benchmark and Agent Framework for Complex Instruction-based Image Generation	May 30, 2025	AllBenchmarking	CodeCode Available	1
Toward Memory-Aided World Models: Benchmarking via Spatial Consistency	May 29, 2025	BenchmarkingMinecraft	CodeCode Available	1
SVRPBench: A Realistic Benchmark for Stochastic Vehicle Routing Problem	May 28, 2025	Benchmarking	CodeCode Available	1
RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments	May 28, 2025	BenchmarkingRed Teaming	CodeCode Available	1
GoMatching++: Parameter- and Data-Efficient Arbitrary-Shaped Video Text Spotting and Benchmarking	May 28, 2025	BenchmarkingText Spotting	CodeCode Available	1
Scalable Parameter and Memory Efficient Pretraining for LLM: Recent Algorithmic Advances and Benchmarking	May 28, 2025	Benchmarking	CodeCode Available	1
FM-Planner: Foundation Model Guided Path Planning for Autonomous Drone Navigation	May 27, 2025	BenchmarkingDecision Making	CodeCode Available	1
Bencher: Simple and Reproducible Benchmarking for Black-Box Optimization	May 27, 2025	Benchmarking	CodeCode Available	1
Benchmarking Multimodal Knowledge Conflict for Large Multimodal Models	May 26, 2025	BenchmarkingRAG	CodeCode Available	1
MineAnyBuild: Benchmarking Spatial Planning for Open-world AI Agents	May 26, 2025	BenchmarkingMinecraft	CodeCode Available	1

Show:10 25 50

← PrevPage 43 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified