SOTAVerified

Decision Making

Papers

Showing 4150 of 12311 papers

TitleStatusHype
Hierarchical Prompting Assists Large Language Model on Web NavigationCode3
FlashDepth: Real-time Streaming Video Depth Estimation at 2K ResolutionCode3
Beyond A*: Better Planning with Transformers via Search Dynamics BootstrappingCode3
Game-theoretic LLM: Agent Workflow for Negotiation GamesCode3
Automatic Gradient Estimation for Calibrating Crowd Models with Discrete Decision MakingCode3
ACEGEN: Reinforcement learning of generative chemical agents for drug discoveryCode3
Evaluating Language Model Agency through NegotiationsCode3
A Demonstration of Adaptive Collaboration of Large Language Models for Medical Decision-MakingCode3
Automated Hypothesis Validation with Agentic Sequential FalsificationsCode3
Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language ModelsCode3
Show:102550
← PrevPage 5 of 1232Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SRLAAverage Remaining Cycles6.4Unverified