SOTAVerified|Agents Browse Leaderboard About Blog

Task Planning

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–10 of 344 papers

Title	Date	Tasks	Status	Hype
Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents	Apr 1, 2025	AI AgentTask Planning	CodeCode Available	11
Agent S: An Open Agentic Framework that Uses Computers Like a Human	Oct 10, 2024	AI AgentTask Planning	CodeCode Available	11
NYU CTF Bench: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security	Jun 8, 2024	Task PlanningVulnerability Detection	CodeCode Available	11
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face	Mar 30, 2023	Automatic Machine Learning Model SelectionModel Selection	CodeCode Available	6
Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security	Jan 10, 2024	Task Planning	CodeCode Available	5
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models	Feb 12, 2024	HallucinationObject Localization	CodeCode Available	4
A Comprehensive Survey of Deep Research: Systems, Methodologies, and Applications	Jun 14, 2025	Information RetrievalSurvey	CodeCode Available	3
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases	Jul 17, 2024	Autonomous DrivingBackdoor Attack	CodeCode Available	3
Tool Learning with Large Language Models: A Survey	May 28, 2024	Response GenerationSurvey	CodeCode Available	3
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation	Mar 8, 2024	Code GenerationHallucination	CodeCode Available	3

Show:10 25 50

← PrevPage 1 of 35Next →

No leaderboard results yet.