SOTAVerified

Task Planning

Papers

Showing 150 of 344 papers

TitleStatusHype
NYU CTF Bench: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive SecurityCode11
Agent S: An Open Agentic Framework that Uses Computers Like a HumanCode11
Agent S2: A Compositional Generalist-Specialist Framework for Computer Use AgentsCode11
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging FaceCode6
Personal LLM Agents: Insights and Survey about the Capability, Efficiency and SecurityCode5
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language ModelsCode4
Tool Learning with Large Language Models: A SurveyCode3
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge BasesCode3
A Comprehensive Survey of Deep Research: Systems, Methodologies, and ApplicationsCode3
Small LLMs Are Weak Tool Learners: A Multi-LLM AgentCode3
Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual ModelsCode3
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon GenerationCode3
Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task AgentsCode2
LLM3:Large Language Model-based Task and Motion Planning with Motion Failure ReasoningCode2
COHERENT: Collaboration of Heterogeneous Multi-Robot System with Large Language ModelsCode2
Tool-Planner: Task Planning with Clusters across Multiple ToolsCode2
RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-WorldCode2
D-CIPHER: Dynamic Collaborative Intelligent Multi-Agent System with Planner and Heterogeneous Executors for Offensive SecurityCode2
NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLMCode2
WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language ModelsCode2
Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial ReasoningCode2
Towards Generalizable Vision-Language Robotic Manipulation: A Benchmark and LLM-guided 3D PolicyCode2
LoTa-Bench: Benchmarking Language-oriented Task Planners for Embodied AgentsCode2
TrustAgent: Towards Safe and Trustworthy LLM-based AgentsCode2
Getting pwn'd by AI: Penetration Testing with Large Language ModelsCode2
SkiROS2: A skill-based Robot Control Platform for ROSCode2
GTA1: GUI Test-time Scaling AgentCode2
Can Graph Learning Improve Planning in LLM-based Agents?Code2
RS-Agent: Automating Remote Sensing Tasks through Intelligent AgentCode2
SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM AgentsCode2
Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language ModelCode2
Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMsCode1
Actionet: An Interactive End-To-End Platform For Task-Based Data Collection And Augmentation In 3D EnvironmentCode1
Integrating Action Knowledge and LLMs for Task Planning and Situation Handling in Open WorldsCode1
EARBench: Towards Evaluating Physical Risk Awareness for Task Planning of Foundation Model-based Embodied AI AgentsCode1
A Multi-modal Garden Dataset and Hybrid 3D Dense Reconstruction Framework Based on Panoramic Stereo Images for a Trimming RobotCode1
Plan-over-Graph: Towards Parallelable LLM Agent ScheduleCode1
Extended Tree Search for Robot Task and Motion PlanningCode1
FlySearch: Exploring how vision-language models exploreCode1
PlanSys2: A Planning System Framework for ROS2Code1
Enhancing LLM-Based Agents via Global Planning and Hierarchical ExecutionCode1
BEDI: A Comprehensive Benchmark for Evaluating Embodied Agents on UAVsCode1
Embodied Task Planning with Large Language ModelsCode1
DISCO: Embodied Navigation and Interaction via Differentiable Scene Semantics and Dual-level ControlCode1
New Interaction Paradigm for Complex EDA Software Leveraging GPTCode1
Multi-Modal Grounded Planning and Efficient Replanning For Learning Embodied Agents with A Few ExamplesCode1
EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level PlanningCode1
Physical Reasoning and Object Planning for Household Embodied AgentsCode1
Sequential Manipulation Planning on Scene GraphCode1
Data-Agnostic Robotic Long-Horizon Manipulation with Vision-Language-Guided Closed-Loop FeedbackCode1
Show:102550
← PrevPage 1 of 7Next →

No leaderboard results yet.