SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 321–330 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks	Jan 10, 2024	Benchmarking	CodeCode Available	2	5
AiTLAS: Artificial Intelligence Toolbox for Earth Observation	Jan 21, 2022	BenchmarkingEarth Observation	CodeCode Available	2	5
InstructLayout: Instruction-Driven 2D and 3D Layout Synthesis with Semantic Graph Prior	Jul 10, 2024	BenchmarkingDecoder	CodeCode Available	2	5
InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior	Feb 7, 2024	BenchmarkingDecoder	CodeCode Available	2	5
Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer	Mar 21, 2025	BenchmarkingVideo Generation	CodeCode Available	2	5
Interactive Agents: Simulating Counselor-Client Psychological Counseling via Role-Playing LLM-to-LLM Interactions	Aug 28, 2024	Benchmarking	CodeCode Available	2	5
Craftium: An Extensible Framework for Creating Reinforcement Learning Environments	Jul 4, 2024	BenchmarkingMinecraft	CodeCode Available	2	5
CRMArena-Pro: Holistic Assessment of LLM Agents Across Diverse Business Scenarios and Interactions	May 24, 2025	Benchmarking	CodeCode Available	2	5
Investigating Tradeoffs in Real-World Video Super-Resolution	Nov 24, 2021	BenchmarkingSuper-Resolution	CodeCode Available	2	5
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation	Oct 30, 2024	BenchmarkingPassage Retrieval	CodeCode Available	2	5

Show:10 25 50

← PrevPage 33 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified