| Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents | Apr 1, 2025 | AI AgentTask Planning | CodeCode Available | 11 |
| Agent S: An Open Agentic Framework that Uses Computers Like a Human | Oct 10, 2024 | AI AgentTask Planning | CodeCode Available | 11 |
| NYU CTF Bench: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security | Jun 8, 2024 | Task PlanningVulnerability Detection | CodeCode Available | 11 |
| HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face | Mar 30, 2023 | Automatic Machine Learning Model SelectionModel Selection | CodeCode Available | 6 |
| Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security | Jan 10, 2024 | Task Planning | CodeCode Available | 5 |
| Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models | Feb 12, 2024 | HallucinationObject Localization | CodeCode Available | 4 |
| A Comprehensive Survey of Deep Research: Systems, Methodologies, and Applications | Jun 14, 2025 | Information RetrievalSurvey | CodeCode Available | 3 |
| AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases | Jul 17, 2024 | Autonomous DrivingBackdoor Attack | CodeCode Available | 3 |
| Tool Learning with Large Language Models: A Survey | May 28, 2024 | Response GenerationSurvey | CodeCode Available | 3 |
| RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation | Mar 8, 2024 | Code GenerationHallucination | CodeCode Available | 3 |
| Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual Models | Jan 17, 2024 | Task Planning | CodeCode Available | 3 |
| Small LLMs Are Weak Tool Learners: A Multi-LLM Agent | Jan 14, 2024 | Language ModellingLarge Language Model | CodeCode Available | 3 |
| GTA1: GUI Test-time Scaling Agent | Jul 8, 2025 | Reinforcement Learning (RL)Task Planning | CodeCode Available | 2 |
| NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLM | Feb 16, 2025 | NavigateRAG | CodeCode Available | 2 |
| D-CIPHER: Dynamic Collaborative Intelligent Multi-Agent System with Planner and Heterogeneous Executors for Offensive Security | Feb 15, 2025 | Task Planning | CodeCode Available | 2 |
| Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model | Dec 30, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM Agents | Dec 17, 2024 | Task Planning | CodeCode Available | 2 |
| Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning | Dec 16, 2024 | HallucinationRobot Manipulation | CodeCode Available | 2 |
| RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-World | Nov 29, 2024 | Robot Task PlanningScheduling | CodeCode Available | 2 |
| WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models | Nov 8, 2024 | Task PlanningZero-shot Generalization | CodeCode Available | 2 |
| Towards Generalizable Vision-Language Robotic Manipulation: A Benchmark and LLM-guided 3D Policy | Oct 2, 2024 | Motion PlanningRobot Manipulation | CodeCode Available | 2 |
| COHERENT: Collaboration of Heterogeneous Multi-Robot System with Large Language Models | Sep 23, 2024 | Robot Task PlanningTask Planning | CodeCode Available | 2 |
| RS-Agent: Automating Remote Sensing Tasks through Intelligent Agent | Jun 11, 2024 | AI AgentDescriptive | CodeCode Available | 2 |
| Tool-Planner: Task Planning with Clusters across Multiple Tools | Jun 6, 2024 | Language ModellingLarge Language Model | CodeCode Available | 2 |
| Can Graph Learning Improve Planning in LLM-based Agents? | May 29, 2024 | Decision MakingGraph Learning | CodeCode Available | 2 |
| LLM3:Large Language Model-based Task and Motion Planning with Motion Failure Reasoning | Mar 18, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| LoTa-Bench: Benchmarking Language-oriented Task Planners for Embodied Agents | Feb 13, 2024 | BenchmarkingModel Selection | CodeCode Available | 2 |
| TrustAgent: Towards Safe and Trustworthy LLM-based Agents | Feb 2, 2024 | Task Planning | CodeCode Available | 2 |
| Getting pwn'd by AI: Penetration Testing with Large Language Models | Jul 24, 2023 | EthicsTask Planning | CodeCode Available | 2 |
| SkiROS2: A skill-based Robot Control Platform for ROS | Jun 29, 2023 | SchedulingTask Planning | CodeCode Available | 2 |
| Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents | Feb 3, 2023 | MinecraftTask Planning | CodeCode Available | 2 |
| FlySearch: Exploring how vision-language models explore | Jun 3, 2025 | HallucinationTask Planning | CodeCode Available | 1 |
| BEDI: A Comprehensive Benchmark for Evaluating Embodied Agents on UAVs | May 23, 2025 | Model OptimizationTask Planning | CodeCode Available | 1 |
| CRAKEN: Cybersecurity LLM Agent with Knowledge-Based Execution | May 21, 2025 | Large Language ModelTask Planning | CodeCode Available | 1 |
| LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics | Apr 30, 2025 | In-Context LearningObject | CodeCode Available | 1 |
| Enhancing LLM-Based Agents via Global Planning and Hierarchical Execution | Apr 23, 2025 | Task Planning | CodeCode Available | 1 |
| Data-Agnostic Robotic Long-Horizon Manipulation with Vision-Language-Guided Closed-Loop Feedback | Mar 27, 2025 | Task Planning | CodeCode Available | 1 |
| LLM+MAP: Bimanual Robot Task Planning using Large Language Models and Planning Domain Definition Language | Mar 21, 2025 | In-Context LearningRobot Task Planning | CodeCode Available | 1 |
| MRBTP: Efficient Multi-Robot Behavior Tree Planning and Collaboration | Feb 25, 2025 | Robot Task PlanningTask Planning | CodeCode Available | 1 |
| Plan-over-Graph: Towards Parallelable LLM Agent Schedule | Feb 20, 2025 | Task Planning | CodeCode Available | 1 |
| Robotouille: An Asynchronous Planning Benchmark for LLM Agents | Feb 6, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Large Language Models for Multi-Robot Systems: A Survey | Feb 6, 2025 | Action GenerationBenchmarking | CodeCode Available | 1 |
| VLM-driven Behavior Tree for Context-aware Task Planning | Jan 7, 2025 | Task Planning | CodeCode Available | 1 |
| Multi-Modal Grounded Planning and Efficient Replanning For Learning Embodied Agents with A Few Examples | Dec 23, 2024 | Common Sense ReasoningTask Planning | CodeCode Available | 1 |
| Closed-Loop Long-Horizon Robotic Planning via Equilibrium Sequence Modeling | Oct 2, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| SELP: Generating Safe and Efficient Task Plans for Robot Agents with Large Language Models | Sep 28, 2024 | Drone navigationRobot Manipulation | CodeCode Available | 1 |
| EARBench: Towards Evaluating Physical Risk Awareness for Task Planning of Foundation Model-based Embodied AI Agents | Aug 8, 2024 | Scene GenerationTask Planning | CodeCode Available | 1 |
| Wonderful Team: Zero-Shot Physical Task Planning with Visual LLMs | Jul 26, 2024 | Action GenerationLarge Language Model | CodeCode Available | 1 |
| DISCO: Embodied Navigation and Interaction via Differentiable Scene Semantics and Dual-level Control | Jul 20, 2024 | Instruction FollowingNavigate | CodeCode Available | 1 |
| Can only LLMs do Reasoning?: Potential of Small Language Models in Task Planning | Apr 5, 2024 | Task Planning | CodeCode Available | 1 |