SOTAVerified

Task Planning

Papers

Showing 125 of 344 papers

TitleStatusHype
Agent S: An Open Agentic Framework that Uses Computers Like a HumanCode11
NYU CTF Bench: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive SecurityCode11
Agent S2: A Compositional Generalist-Specialist Framework for Computer Use AgentsCode11
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging FaceCode6
Personal LLM Agents: Insights and Survey about the Capability, Efficiency and SecurityCode5
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language ModelsCode4
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon GenerationCode3
Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual ModelsCode3
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge BasesCode3
Tool Learning with Large Language Models: A SurveyCode3
A Comprehensive Survey of Deep Research: Systems, Methodologies, and ApplicationsCode3
Small LLMs Are Weak Tool Learners: A Multi-LLM AgentCode3
RS-Agent: Automating Remote Sensing Tasks through Intelligent AgentCode2
RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-WorldCode2
SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM AgentsCode2
Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial ReasoningCode2
Getting pwn'd by AI: Penetration Testing with Large Language ModelsCode2
SkiROS2: A skill-based Robot Control Platform for ROSCode2
D-CIPHER: Dynamic Collaborative Intelligent Multi-Agent System with Planner and Heterogeneous Executors for Offensive SecurityCode2
Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task AgentsCode2
LLM3:Large Language Model-based Task and Motion Planning with Motion Failure ReasoningCode2
COHERENT: Collaboration of Heterogeneous Multi-Robot System with Large Language ModelsCode2
LoTa-Bench: Benchmarking Language-oriented Task Planners for Embodied AgentsCode2
GTA1: GUI Test-time Scaling AgentCode2
Can Graph Learning Improve Planning in LLM-based Agents?Code2
Show:102550
← PrevPage 1 of 14Next →

No leaderboard results yet.