SOTAVerified

Decision Making

Papers

Showing 51100 of 12311 papers

TitleStatusHype
PokeLLMon: A Human-Parity Agent for Pokemon Battles with Large Language ModelsCode3
Planning with Diffusion for Flexible Behavior SynthesisCode3
Parallelized Planning-Acting for Efficient LLM-based Multi-Agent SystemsCode3
Playing Non-Embedded Card-Based Games with Reinforcement LearningCode3
Rethinking Early Stopping: Refine, Then CalibrateCode3
MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-MakingCode3
MineStudio: A Streamlined Package for Minecraft AI Agent DevelopmentCode3
Beyond Quacking: Deep Integration of Language Models and RAG into DuckDBCode3
ACEGEN: Reinforcement learning of generative chemical agents for drug discoveryCode3
LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot LearningCode3
Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language ModelsCode3
Automatic Gradient Estimation for Calibrating Crowd Models with Discrete Decision MakingCode3
Behavior Generation with Latent ActionsCode3
Attention is not not ExplanationCode3
A Survey on the Optimization of Large Language Model-based AgentsCode3
AuctionNet: A Novel Benchmark for Decision-Making in Large-Scale GamesCode3
A Demonstration of Adaptive Collaboration of Large Language Models for Medical Decision-MakingCode3
Automated Hypothesis Validation with Agentic Sequential FalsificationsCode3
Game-theoretic LLM: Agent Workflow for Negotiation GamesCode3
Hierarchical Prompting Assists Large Language Model on Web NavigationCode3
FinCon: A Synthesized LLM Multi-Agent System with Conceptual Verbal Reinforcement for Enhanced Financial Decision MakingCode2
A Survey of Financial AI: Architectures, Advances and Open ChallengesCode2
FinMem: A Performance-Enhanced LLM Trading Agent with Layered Memory and Character DesignCode2
Astock: A New Dataset and Automated Stock Trading based on Stock-specific News Analyzing ModelCode2
Fairness Evaluation for Uplift Modeling in the Absence of Ground TruthCode2
Global birdsong embeddings enable superior transfer learning for bioacoustic classificationCode2
Expandable Subspace Ensemble for Pre-Trained Model-Based Class-Incremental LearningCode2
ExpeL: LLM Agents Are Experiential LearnersCode2
AdaSociety: An Adaptive Environment with Social Structures for Multi-Agent Decision-MakingCode2
A Survey of Time Series Foundation Models: Generalizing Time Series Representation with Large Language ModelCode2
Harnessing Explanations: LLM-to-LM Interpreter for Enhanced Text-Attributed Graph Representation LearningCode2
A Review of Safe Reinforcement Learning: Methods, Theory and ApplicationsCode2
Enhancing Autonomous Driving Systems with On-Board Deployed Large Language ModelsCode2
Distributional Soft Actor-Critic with Three RefinementsCode2
Dungeons and Data: A Large-Scale NetHack DatasetCode2
DrivingSphere: Building a High-fidelity 4D World for Closed-loop SimulationCode2
ADAPT: Action-aware Driving Caption TransformerCode2
AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM AgentsCode2
Do As I Can, Not As I Say: Grounding Language in Robotic AffordancesCode2
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous DrivingCode2
Embodied LLM Agents Learn to Cooperate in Organized TeamsCode2
ActiveSplat: High-Fidelity Scene Reconstruction through Active Gaussian SplattingCode2
Digital Player: Evaluating Large Language Models based Human-like Agent in GamesCode2
Revocable Deep Reinforcement Learning with Affinity Regularization for Outlier-Robust Graph MatchingCode2
Diffusion Actor-Critic with Entropy RegulatorCode2
DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language ModelsCode2
DecisionNCE: Embodied Multimodal Representations via Implicit Preference LearningCode2
Doe-1: Closed-Loop Autonomous Driving with Large World ModelCode2
Deep Generative Models for Offline Policy Learning: Tutorial, Survey, and Perspectives on Future DirectionsCode2
Alphazero-like Tree-Search can Guide Large Language Model Decoding and TrainingCode2
Show:102550
← PrevPage 2 of 247Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SRLAAverage Remaining Cycles6.4Unverified