SOTAVerified

Task Planning

Papers

Showing 150 of 344 papers

TitleStatusHype
Agent S: An Open Agentic Framework that Uses Computers Like a HumanCode11
Agent S2: A Compositional Generalist-Specialist Framework for Computer Use AgentsCode11
NYU CTF Bench: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive SecurityCode11
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging FaceCode6
Personal LLM Agents: Insights and Survey about the Capability, Efficiency and SecurityCode5
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language ModelsCode4
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge BasesCode3
Tool Learning with Large Language Models: A SurveyCode3
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon GenerationCode3
A Comprehensive Survey of Deep Research: Systems, Methodologies, and ApplicationsCode3
Small LLMs Are Weak Tool Learners: A Multi-LLM AgentCode3
Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual ModelsCode3
SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM AgentsCode2
Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task AgentsCode2
RS-Agent: Automating Remote Sensing Tasks through Intelligent AgentCode2
Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial ReasoningCode2
RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-WorldCode2
NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLMCode2
Towards Generalizable Vision-Language Robotic Manipulation: A Benchmark and LLM-guided 3D PolicyCode2
LLM3:Large Language Model-based Task and Motion Planning with Motion Failure ReasoningCode2
Getting pwn'd by AI: Penetration Testing with Large Language ModelsCode2
GTA1: GUI Test-time Scaling AgentCode2
LoTa-Bench: Benchmarking Language-oriented Task Planners for Embodied AgentsCode2
TrustAgent: Towards Safe and Trustworthy LLM-based AgentsCode2
Tool-Planner: Task Planning with Clusters across Multiple ToolsCode2
Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language ModelCode2
SkiROS2: A skill-based Robot Control Platform for ROSCode2
COHERENT: Collaboration of Heterogeneous Multi-Robot System with Large Language ModelsCode2
Can Graph Learning Improve Planning in LLM-based Agents?Code2
D-CIPHER: Dynamic Collaborative Intelligent Multi-Agent System with Planner and Heterogeneous Executors for Offensive SecurityCode2
WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language ModelsCode2
Actionet: An Interactive End-To-End Platform For Task-Based Data Collection And Augmentation In 3D EnvironmentCode1
PlanSys2: A Planning System Framework for ROS2Code1
Plan-over-Graph: Towards Parallelable LLM Agent ScheduleCode1
EARBench: Towards Evaluating Physical Risk Awareness for Task Planning of Foundation Model-based Embodied AI AgentsCode1
A Multi-modal Garden Dataset and Hybrid 3D Dense Reconstruction Framework Based on Panoramic Stereo Images for a Trimming RobotCode1
Physical Reasoning and Object Planning for Household Embodied AgentsCode1
Planning as In-Painting: A Diffusion-Based Embodied Task Planning Framework for Environments under UncertaintyCode1
New Interaction Paradigm for Complex EDA Software Leveraging GPTCode1
Multi-Modal Grounded Planning and Efficient Replanning For Learning Embodied Agents with A Few ExamplesCode1
Sequential Manipulation Planning on Scene GraphCode1
LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household RoboticsCode1
BEDI: A Comprehensive Benchmark for Evaluating Embodied Agents on UAVsCode1
Large Language Models for Multi-Robot Systems: A SurveyCode1
LLM+MAP: Bimanual Robot Task Planning using Large Language Models and Planning Domain Definition LanguageCode1
Integrating Action Knowledge and LLMs for Task Planning and Situation Handling in Open WorldsCode1
Can only LLMs do Reasoning?: Potential of Small Language Models in Task PlanningCode1
Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMsCode1
Embodied Task Planning with Large Language ModelsCode1
Enhancing LLM-Based Agents via Global Planning and Hierarchical ExecutionCode1
Show:102550
← PrevPage 1 of 7Next →

No leaderboard results yet.