| Agent S: An Open Agentic Framework that Uses Computers Like a Human | Oct 10, 2024 | AI AgentTask Planning | CodeCode Available | 11 |
| Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents | Apr 1, 2025 | AI AgentTask Planning | CodeCode Available | 11 |
| NYU CTF Bench: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security | Jun 8, 2024 | Task PlanningVulnerability Detection | CodeCode Available | 11 |
| HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face | Mar 30, 2023 | Automatic Machine Learning Model SelectionModel Selection | CodeCode Available | 6 |
| Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security | Jan 10, 2024 | Task Planning | CodeCode Available | 5 |
| Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models | Feb 12, 2024 | HallucinationObject Localization | CodeCode Available | 4 |
| RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation | Mar 8, 2024 | Code GenerationHallucination | CodeCode Available | 3 |
| AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases | Jul 17, 2024 | Autonomous DrivingBackdoor Attack | CodeCode Available | 3 |
| A Comprehensive Survey of Deep Research: Systems, Methodologies, and Applications | Jun 14, 2025 | Information RetrievalSurvey | CodeCode Available | 3 |
| Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual Models | Jan 17, 2024 | Task Planning | CodeCode Available | 3 |