SOTAVerified

Decision Making

Papers

Showing 5175 of 12311 papers

TitleStatusHype
Beyond A*: Better Planning with Transformers via Search Dynamics BootstrappingCode3
Planning with Diffusion for Flexible Behavior SynthesisCode3
Beyond Quacking: Deep Integration of Language Models and RAG into DuckDBCode3
Behavior Generation with Latent ActionsCode3
Automatic Gradient Estimation for Calibrating Crowd Models with Discrete Decision MakingCode3
ACEGEN: Reinforcement learning of generative chemical agents for drug discoveryCode3
Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language ModelsCode3
FlashDepth: Real-time Streaming Video Depth Estimation at 2K ResolutionCode3
A Survey on the Optimization of Large Language Model-based AgentsCode3
Attention is not not ExplanationCode3
Evaluating Language Model Agency through NegotiationsCode3
A Smart Multimodal Healthcare Copilot with Powerful LLM ReasoningCode3
MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-MakingCode3
Enhancing Decision Analysis with a Large Language Model: pyDecision a Comprehensive Library of MCDA Methods in PythonCode3
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision MakingCode3
AuctionNet: A Novel Benchmark for Decision-Making in Large-Scale GamesCode3
Embodied CoT Distillation From LLM To Off-the-shelf AgentsCode3
Evolve Cost-aware Acquisition Functions Using Large Language ModelsCode3
A Demonstration of Adaptive Collaboration of Large Language Models for Medical Decision-MakingCode3
Game-theoretic LLM: Agent Workflow for Negotiation GamesCode3
ADAPT: Action-aware Driving Caption TransformerCode2
AdaFlow: Imitation Learning with Variance-Adaptive Flow-Based PoliciesCode2
Disentangling Memory and Reasoning Ability in Large Language ModelsCode2
Digital Player: Evaluating Large Language Models based Human-like Agent in GamesCode2
DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language ModelsCode2
Show:102550
← PrevPage 3 of 493Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SRLAAverage Remaining Cycles6.4Unverified