| Deep Reinforcement Learning based Recommendation with Explicit User-Item Interactions Modeling | Oct 29, 2018 | Collaborative FilteringDecision Making | CodeCode Available | 1 | 5 |
| Skill Set Optimization: Reinforcing Language Model Behavior via Transferable Skills | Feb 5, 2024 | Decision MakingLanguage Modeling | CodeCode Available | 1 | 5 |
| Layered and Staged Monte Carlo Tree Search for SMT Strategy Synthesis | Jan 30, 2024 | Decision MakingEfficient Exploration | CodeCode Available | 1 | 5 |
| Deep Reinforcement Learning for Entity Alignment | Mar 7, 2022 | Decision MakingDeep Reinforcement Learning | CodeCode Available | 1 | 5 |
| Can Increasing Input Dimensionality Improve Deep Reinforcement Learning? | Mar 3, 2020 | Decision MakingDeep Reinforcement Learning | CodeCode Available | 1 | 5 |
| LLINBO: Trustworthy LLM-in-the-Loop Bayesian Optimization | May 20, 2025 | Bayesian OptimizationGaussian Processes | CodeCode Available | 1 | 5 |
| The Sandbox Environment for Generalizable Agent Research (SEGAR) | Mar 19, 2022 | Decision MakingSequential Decision Making | CodeCode Available | 1 | 5 |
| Thinking Fast and Slow with Deep Learning and Tree Search | May 23, 2017 | Decision MakingDeep Learning | CodeCode Available | 1 | 5 |
| Does the Markov Decision Process Fit the Data: Testing for the Markov Property in Sequential Decision Making | Feb 5, 2020 | Decision Makingreinforcement-learning | CodeCode Available | 1 | 5 |
| Training a Generally Curious Agent | Feb 24, 2025 | Decision MakingEfficient Exploration | CodeCode Available | 1 | 5 |
| Learning Discrete World Models for Heuristic Search | Sep 14, 2024 | Deep Reinforcement LearningHeuristic Search | CodeCode Available | 1 | 5 |
| Unified Models of Human Behavioral Agents in Bandits, Contextual Bandits and RL | May 10, 2020 | Decision MakingLifelong learning | CodeCode Available | 1 | 5 |
| IQ-Learn: Inverse soft-Q Learning for Imitation | Jun 23, 2021 | Atari GamesContinuous Control | CodeCode Available | 1 | 5 |
| Can language agents be alternatives to PPO? A Preliminary Empirical Study On OpenAI Gym | Dec 6, 2023 | BenchmarkingDecision Making | CodeCode Available | 1 | 5 |
| Large Language Model as a Policy Teacher for Training Reinforcement Learning Agents | Nov 22, 2023 | Decision MakingLanguage Modeling | CodeCode Available | 1 | 5 |
| Learning Dynamic Belief Graphs to Generalize on Text-Based Games | Feb 21, 2020 | Decision MakingKnowledge Graphs | CodeCode Available | 1 | 5 |
| Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning | Oct 30, 2023 | Decision MakingOffline RL | CodeCode Available | 1 | 5 |
| Extracting Reward Functions from Diffusion Models | Jun 1, 2023 | Decision MakingImage Generation | CodeCode Available | 1 | 5 |
| How Can LLM Guide RL? A Value-Based Approach | Feb 25, 2024 | Decision MakingReinforcement Learning (RL) | CodeCode Available | 1 | 5 |
| Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning Approach | Jun 6, 2023 | Decision MakingSequential Decision Making | CodeCode Available | 1 | 5 |
| Efficient Symptom Inquiring and Diagnosis via Adaptive Alignment of Reinforcement Learning and Classification | Dec 1, 2021 | Decision MakingDiagnostic | CodeCode Available | 1 | 5 |
| Breadcrumbs to the Goal: Goal-Conditioned Exploration from Human-in-the-Loop Feedback | Jul 20, 2023 | Decision Makingreinforcement-learning | CodeCode Available | 1 | 5 |
| Hybrid Multi-agent Deep Reinforcement Learning for Autonomous Mobility on Demand Systems | Dec 14, 2022 | Decision MakingDeep Reinforcement Learning | CodeCode Available | 1 | 5 |
| Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization | Oct 3, 2022 | Decision MakingPolicy Gradient Methods | CodeCode Available | 1 | 5 |
| Dynamic Multi-Robot Task Allocation under Uncertainty and Temporal Constraints | May 27, 2020 | Decision MakingDecision Making Under Uncertainty | CodeCode Available | 1 | 5 |
| AdaPlanner: Adaptive Planning from Feedback with Language Models | May 26, 2023 | Decision MakingHallucination | CodeCode Available | 1 | 5 |
| Bridging POMDPs and Bayesian decision making for robust maintenance planning under model uncertainty: An application to railway systems | Dec 15, 2022 | Decision MakingSequential Decision Making | CodeCode Available | 1 | 5 |
| Large Language Models for Planning: A Comprehensive and Systematic Survey | May 26, 2025 | Logical ReasoningNavigate | CodeCode Available | 1 | 5 |
| Approximate Inference in Discrete Distributions with Monte Carlo Tree Search and Value Functions | Oct 15, 2019 | Decision MakingDecision Making Under Uncertainty | CodeCode Available | 1 | 5 |
| Effective Reinforcement Learning through Evolutionary Surrogate-Assisted Prescription | Feb 13, 2020 | Decision Makingreinforcement-learning | CodeCode Available | 1 | 5 |
| RELIEF: Reinforcement Learning Empowered Graph Feature Prompt Tuning | Aug 6, 2024 | Combinatorial OptimizationGraph Neural Network | CodeCode Available | 1 | 5 |
| LogiCity: Advancing Neuro-Symbolic AI with Abstract Urban Simulation | Nov 1, 2024 | Logical ReasoningSequential Decision Making | CodeCode Available | 1 | 5 |
| Dynamic Causal Bayesian Optimization | Oct 26, 2021 | Bayesian OptimizationCausal Inference | CodeCode Available | 1 | 5 |
| Masked Trajectory Models for Prediction, Representation, and Control | May 4, 2023 | continuous-controlContinuous Control | CodeCode Available | 1 | 5 |
| Efficient Nonmyopic Bayesian Optimization via One-Shot Multi-Step Trees | Jun 29, 2020 | Bayesian OptimizationDecision Making | CodeCode Available | 1 | 5 |
| An Alternative Softmax Operator for Reinforcement Learning | Dec 16, 2016 | Decision Makingreinforcement-learning | CodeCode Available | 1 | 5 |
| Multi-task Causal Learning with Gaussian Processes | Sep 27, 2020 | Active LearningBayesian Optimization | CodeCode Available | 1 | 5 |
| ContainerGym: A Real-World Reinforcement Learning Benchmark for Resource Allocation | Jul 6, 2023 | Decision MakingDeep Reinforcement Learning | CodeCode Available | 1 | 5 |
| Occupancy Anticipation for Efficient Exploration and Navigation | Aug 21, 2020 | Decision MakingEfficient Exploration | CodeCode Available | 1 | 5 |
| On Generalization Across Environments In Multi-Objective Reinforcement Learning | Mar 2, 2025 | Decision MakingMulti-Objective Reinforcement Learning | CodeCode Available | 1 | 5 |
| Independent Reinforcement Learning for Weakly Cooperative Multiagent Traffic Control Problem | Apr 22, 2021 | Decision Makingreinforcement-learning | CodeCode Available | 1 | 5 |
| Premier-TACO is a Few-Shot Policy Learner: Pretraining Multitask Representation via Temporal Action-Driven Contrastive Loss | Feb 9, 2024 | Computational Efficiencycontinuous-control | CodeCode Available | 1 | 5 |
| An empirical evaluation of active inference in multi-armed bandits | Jan 21, 2021 | BIG-bench Machine LearningDecision Making | CodeCode Available | 1 | 5 |
| Counterfactual Explanations in Sequential Decision Making Under Uncertainty | Jul 6, 2021 | counterfactualCounterfactual Explanation | CodeCode Available | 1 | 5 |
| Adaptive Stress Testing of Trajectory Predictions in Flight Management Systems | Nov 4, 2020 | Decision MakingManagement | CodeCode Available | 1 | 5 |
| Decision Mamba: Reinforcement Learning via Sequence Modeling with Selective State Spaces | Mar 29, 2024 | Decision MakingMamba | CodeCode Available | 1 | 5 |
| Learning Multi-Level Hierarchies with Hindsight | Dec 4, 2017 | Decision MakingHierarchical Reinforcement Learning | CodeCode Available | 1 | 5 |
| Reinforcement learning with combinatorial actions for coupled restless bandits | Mar 1, 2025 | reinforcement-learningReinforcement Learning | CodeCode Available | 1 | 5 |
| Reinforced Sequential Decision-Making for Sepsis Treatment: The POSNEGDM Framework with Mortality Classifier and Transformer | Mar 12, 2024 | Decision MakingSequential Decision Making | CodeCode Available | 1 | 5 |
| Discrete-Time Distribution Steering using Monte Carlo Tree Search | Dec 9, 2024 | Decision MakingSequential Decision Making | CodeCode Available | 0 | 5 |