| Agent S: An Open Agentic Framework that Uses Computers Like a Human | Oct 10, 2024 | AI AgentTask Planning | CodeCode Available | 11 | 5 |
| Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents | Apr 1, 2025 | AI AgentTask Planning | CodeCode Available | 11 | 5 |
| NYU CTF Bench: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security | Jun 8, 2024 | Task PlanningVulnerability Detection | CodeCode Available | 11 | 5 |
| HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face | Mar 30, 2023 | Automatic Machine Learning Model SelectionModel Selection | CodeCode Available | 6 | 5 |
| Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security | Jan 10, 2024 | Task Planning | CodeCode Available | 5 | 5 |
| Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models | Feb 12, 2024 | HallucinationObject Localization | CodeCode Available | 4 | 5 |
| Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual Models | Jan 17, 2024 | Task Planning | CodeCode Available | 3 | 5 |
| RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation | Mar 8, 2024 | Code GenerationHallucination | CodeCode Available | 3 | 5 |
| AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases | Jul 17, 2024 | Autonomous DrivingBackdoor Attack | CodeCode Available | 3 | 5 |
| Tool Learning with Large Language Models: A Survey | May 28, 2024 | Response GenerationSurvey | CodeCode Available | 3 | 5 |
| A Comprehensive Survey of Deep Research: Systems, Methodologies, and Applications | Jun 14, 2025 | Information RetrievalSurvey | CodeCode Available | 3 | 5 |
| Small LLMs Are Weak Tool Learners: A Multi-LLM Agent | Jan 14, 2024 | Language ModellingLarge Language Model | CodeCode Available | 3 | 5 |
| RS-Agent: Automating Remote Sensing Tasks through Intelligent Agent | Jun 11, 2024 | AI AgentDescriptive | CodeCode Available | 2 | 5 |
| D-CIPHER: Dynamic Collaborative Intelligent Multi-Agent System with Planner and Heterogeneous Executors for Offensive Security | Feb 15, 2025 | Task Planning | CodeCode Available | 2 | 5 |
| SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM Agents | Dec 17, 2024 | Task Planning | CodeCode Available | 2 | 5 |
| Getting pwn'd by AI: Penetration Testing with Large Language Models | Jul 24, 2023 | EthicsTask Planning | CodeCode Available | 2 | 5 |
| RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-World | Nov 29, 2024 | Robot Task PlanningScheduling | CodeCode Available | 2 | 5 |
| SkiROS2: A skill-based Robot Control Platform for ROS | Jun 29, 2023 | SchedulingTask Planning | CodeCode Available | 2 | 5 |
| COHERENT: Collaboration of Heterogeneous Multi-Robot System with Large Language Models | Sep 23, 2024 | Robot Task PlanningTask Planning | CodeCode Available | 2 | 5 |
| NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLM | Feb 16, 2025 | NavigateRAG | CodeCode Available | 2 | 5 |
| Can Graph Learning Improve Planning in LLM-based Agents? | May 29, 2024 | Decision MakingGraph Learning | CodeCode Available | 2 | 5 |
| Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents | Feb 3, 2023 | MinecraftTask Planning | CodeCode Available | 2 | 5 |
| LLM3:Large Language Model-based Task and Motion Planning with Motion Failure Reasoning | Mar 18, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| GTA1: GUI Test-time Scaling Agent | Jul 8, 2025 | Reinforcement Learning (RL)Task Planning | CodeCode Available | 2 | 5 |
| Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning | Dec 16, 2024 | HallucinationRobot Manipulation | CodeCode Available | 2 | 5 |