SOTAVerified

Task Planning

Papers

Showing 110 of 344 papers

TitleStatusHype
Agent S: An Open Agentic Framework that Uses Computers Like a HumanCode11
Agent S2: A Compositional Generalist-Specialist Framework for Computer Use AgentsCode11
NYU CTF Bench: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive SecurityCode11
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging FaceCode6
Personal LLM Agents: Insights and Survey about the Capability, Efficiency and SecurityCode5
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language ModelsCode4
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon GenerationCode3
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge BasesCode3
A Comprehensive Survey of Deep Research: Systems, Methodologies, and ApplicationsCode3
Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual ModelsCode3
Show:102550
← PrevPage 1 of 35Next →

No leaderboard results yet.