| Agent S: An Open Agentic Framework that Uses Computers Like a Human | Oct 10, 2024 | AI AgentTask Planning | CodeCode Available | 11 |
| Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents | Apr 1, 2025 | AI AgentTask Planning | CodeCode Available | 11 |
| NYU CTF Bench: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security | Jun 8, 2024 | Task PlanningVulnerability Detection | CodeCode Available | 11 |
| HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face | Mar 30, 2023 | Automatic Machine Learning Model SelectionModel Selection | CodeCode Available | 6 |
| Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security | Jan 10, 2024 | Task Planning | CodeCode Available | 5 |
| Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models | Feb 12, 2024 | HallucinationObject Localization | CodeCode Available | 4 |
| AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases | Jul 17, 2024 | Autonomous DrivingBackdoor Attack | CodeCode Available | 3 |
| Tool Learning with Large Language Models: A Survey | May 28, 2024 | Response GenerationSurvey | CodeCode Available | 3 |
| RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation | Mar 8, 2024 | Code GenerationHallucination | CodeCode Available | 3 |
| A Comprehensive Survey of Deep Research: Systems, Methodologies, and Applications | Jun 14, 2025 | Information RetrievalSurvey | CodeCode Available | 3 |
| Small LLMs Are Weak Tool Learners: A Multi-LLM Agent | Jan 14, 2024 | Language ModellingLarge Language Model | CodeCode Available | 3 |
| Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual Models | Jan 17, 2024 | Task Planning | CodeCode Available | 3 |
| SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM Agents | Dec 17, 2024 | Task Planning | CodeCode Available | 2 |
| Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents | Feb 3, 2023 | MinecraftTask Planning | CodeCode Available | 2 |
| RS-Agent: Automating Remote Sensing Tasks through Intelligent Agent | Jun 11, 2024 | AI AgentDescriptive | CodeCode Available | 2 |
| Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning | Dec 16, 2024 | HallucinationRobot Manipulation | CodeCode Available | 2 |
| RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-World | Nov 29, 2024 | Robot Task PlanningScheduling | CodeCode Available | 2 |
| NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLM | Feb 16, 2025 | NavigateRAG | CodeCode Available | 2 |
| Towards Generalizable Vision-Language Robotic Manipulation: A Benchmark and LLM-guided 3D Policy | Oct 2, 2024 | Motion PlanningRobot Manipulation | CodeCode Available | 2 |
| LLM3:Large Language Model-based Task and Motion Planning with Motion Failure Reasoning | Mar 18, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Getting pwn'd by AI: Penetration Testing with Large Language Models | Jul 24, 2023 | EthicsTask Planning | CodeCode Available | 2 |
| GTA1: GUI Test-time Scaling Agent | Jul 8, 2025 | Reinforcement Learning (RL)Task Planning | CodeCode Available | 2 |
| LoTa-Bench: Benchmarking Language-oriented Task Planners for Embodied Agents | Feb 13, 2024 | BenchmarkingModel Selection | CodeCode Available | 2 |
| TrustAgent: Towards Safe and Trustworthy LLM-based Agents | Feb 2, 2024 | Task Planning | CodeCode Available | 2 |
| Tool-Planner: Task Planning with Clusters across Multiple Tools | Jun 6, 2024 | Language ModellingLarge Language Model | CodeCode Available | 2 |
| Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model | Dec 30, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| SkiROS2: A skill-based Robot Control Platform for ROS | Jun 29, 2023 | SchedulingTask Planning | CodeCode Available | 2 |
| COHERENT: Collaboration of Heterogeneous Multi-Robot System with Large Language Models | Sep 23, 2024 | Robot Task PlanningTask Planning | CodeCode Available | 2 |
| Can Graph Learning Improve Planning in LLM-based Agents? | May 29, 2024 | Decision MakingGraph Learning | CodeCode Available | 2 |
| D-CIPHER: Dynamic Collaborative Intelligent Multi-Agent System with Planner and Heterogeneous Executors for Offensive Security | Feb 15, 2025 | Task Planning | CodeCode Available | 2 |
| WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models | Nov 8, 2024 | Task PlanningZero-shot Generalization | CodeCode Available | 2 |
| Actionet: An Interactive End-To-End Platform For Task-Based Data Collection And Augmentation In 3D Environment | Oct 3, 2020 | Dataset GenerationTask Planning | CodeCode Available | 1 |
| PlanSys2: A Planning System Framework for ROS2 | Jul 1, 2021 | Task Planning | CodeCode Available | 1 |
| Plan-over-Graph: Towards Parallelable LLM Agent Schedule | Feb 20, 2025 | Task Planning | CodeCode Available | 1 |
| EARBench: Towards Evaluating Physical Risk Awareness for Task Planning of Foundation Model-based Embodied AI Agents | Aug 8, 2024 | Scene GenerationTask Planning | CodeCode Available | 1 |
| A Multi-modal Garden Dataset and Hybrid 3D Dense Reconstruction Framework Based on Panoramic Stereo Images for a Trimming Robot | May 10, 2023 | Task Planning | CodeCode Available | 1 |
| Physical Reasoning and Object Planning for Household Embodied Agents | Nov 22, 2023 | 2kDecision Making | CodeCode Available | 1 |
| Planning as In-Painting: A Diffusion-Based Embodied Task Planning Framework for Environments under Uncertainty | Dec 2, 2023 | DenoisingTask Planning | CodeCode Available | 1 |
| New Interaction Paradigm for Complex EDA Software Leveraging GPT | Jul 27, 2023 | Task Planning | CodeCode Available | 1 |
| Multi-Modal Grounded Planning and Efficient Replanning For Learning Embodied Agents with A Few Examples | Dec 23, 2024 | Common Sense ReasoningTask Planning | CodeCode Available | 1 |
| Sequential Manipulation Planning on Scene Graph | Jul 10, 2022 | Object RearrangementStochastic Optimization | CodeCode Available | 1 |
| LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics | Apr 30, 2025 | In-Context LearningObject | CodeCode Available | 1 |
| BEDI: A Comprehensive Benchmark for Evaluating Embodied Agents on UAVs | May 23, 2025 | Model OptimizationTask Planning | CodeCode Available | 1 |
| Large Language Models for Multi-Robot Systems: A Survey | Feb 6, 2025 | Action GenerationBenchmarking | CodeCode Available | 1 |
| LLM+MAP: Bimanual Robot Task Planning using Large Language Models and Planning Domain Definition Language | Mar 21, 2025 | In-Context LearningRobot Task Planning | CodeCode Available | 1 |
| Integrating Action Knowledge and LLMs for Task Planning and Situation Handling in Open Worlds | May 27, 2023 | Task PlanningWorld Knowledge | CodeCode Available | 1 |
| Can only LLMs do Reasoning?: Potential of Small Language Models in Task Planning | Apr 5, 2024 | Task Planning | CodeCode Available | 1 |
| Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs | Nov 6, 2023 | Imitation LearningIn-Context Learning | CodeCode Available | 1 |
| Embodied Task Planning with Large Language Models | Jul 4, 2023 | Task Planning | CodeCode Available | 1 |
| Enhancing LLM-Based Agents via Global Planning and Hierarchical Execution | Apr 23, 2025 | Task Planning | CodeCode Available | 1 |