SOTAVerified

Decision Making

Papers

Showing 5175 of 12311 papers

TitleStatusHype
LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot LearningCode3
Planning with Diffusion for Flexible Behavior SynthesisCode3
Hierarchical Prompting Assists Large Language Model on Web NavigationCode3
MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-MakingCode3
Beyond A*: Better Planning with Transformers via Search Dynamics BootstrappingCode3
Beyond Quacking: Deep Integration of Language Models and RAG into DuckDBCode3
FlashDepth: Real-time Streaming Video Depth Estimation at 2K ResolutionCode3
Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language ModelsCode3
Behavior Generation with Latent ActionsCode3
Evolve Cost-aware Acquisition Functions Using Large Language ModelsCode3
Game-theoretic LLM: Agent Workflow for Negotiation GamesCode3
A Survey on the Optimization of Large Language Model-based AgentsCode3
ACEGEN: Reinforcement learning of generative chemical agents for drug discoveryCode3
AuctionNet: A Novel Benchmark for Decision-Making in Large-Scale GamesCode3
A Smart Multimodal Healthcare Copilot with Powerful LLM ReasoningCode3
Enhancing Decision Analysis with a Large Language Model: pyDecision a Comprehensive Library of MCDA Methods in PythonCode3
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision MakingCode3
Embodied CoT Distillation From LLM To Off-the-shelf AgentsCode3
Evaluating Language Model Agency through NegotiationsCode3
Attention is not not ExplanationCode3
DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language ModelsCode2
Digital Player: Evaluating Large Language Models based Human-like Agent in GamesCode2
Disentangling Memory and Reasoning Ability in Large Language ModelsCode2
Diffusion Actor-Critic with Entropy RegulatorCode2
Distribution-Free, Risk-Controlling Prediction SetsCode2
Show:102550
← PrevPage 3 of 493Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SRLAAverage Remaining Cycles6.4Unverified