SOTAVerified

Task Planning

Papers

Showing 110 of 344 papers

TitleStatusHype
Agent S2: A Compositional Generalist-Specialist Framework for Computer Use AgentsCode11
Agent S: An Open Agentic Framework that Uses Computers Like a HumanCode11
NYU CTF Bench: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive SecurityCode11
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging FaceCode6
Personal LLM Agents: Insights and Survey about the Capability, Efficiency and SecurityCode5
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language ModelsCode4
A Comprehensive Survey of Deep Research: Systems, Methodologies, and ApplicationsCode3
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge BasesCode3
Tool Learning with Large Language Models: A SurveyCode3
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon GenerationCode3
Show:102550
← PrevPage 1 of 35Next →

No leaderboard results yet.