SOTAVerified

Task Planning

Papers

Showing 125 of 344 papers

TitleStatusHype
Agent S2: A Compositional Generalist-Specialist Framework for Computer Use AgentsCode11
Agent S: An Open Agentic Framework that Uses Computers Like a HumanCode11
NYU CTF Bench: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive SecurityCode11
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging FaceCode6
Personal LLM Agents: Insights and Survey about the Capability, Efficiency and SecurityCode5
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language ModelsCode4
A Comprehensive Survey of Deep Research: Systems, Methodologies, and ApplicationsCode3
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge BasesCode3
Tool Learning with Large Language Models: A SurveyCode3
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon GenerationCode3
Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual ModelsCode3
Small LLMs Are Weak Tool Learners: A Multi-LLM AgentCode3
GTA1: GUI Test-time Scaling AgentCode2
NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLMCode2
D-CIPHER: Dynamic Collaborative Intelligent Multi-Agent System with Planner and Heterogeneous Executors for Offensive SecurityCode2
Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language ModelCode2
SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM AgentsCode2
Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial ReasoningCode2
RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-WorldCode2
WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language ModelsCode2
Towards Generalizable Vision-Language Robotic Manipulation: A Benchmark and LLM-guided 3D PolicyCode2
COHERENT: Collaboration of Heterogeneous Multi-Robot System with Large Language ModelsCode2
RS-Agent: Automating Remote Sensing Tasks through Intelligent AgentCode2
Tool-Planner: Task Planning with Clusters across Multiple ToolsCode2
Can Graph Learning Improve Planning in LLM-based Agents?Code2
Show:102550
← PrevPage 1 of 14Next →

No leaderboard results yet.