| Multi-Agent Reinforcement Learning for Autonomous Driving: A Survey | Aug 19, 2024 | Autonomous DrivingDecision Making | CodeCode Available | 5 |
| Eureka: Human-Level Reward Design via Coding Large Language Models | Oct 19, 2023 | Decision MakingIn-Context Learning | CodeCode Available | 4 |
| Reflexion: Language Agents with Verbal Reinforcement Learning | Mar 20, 2023 | Decision MakingHumanEval | CodeCode Available | 4 |
| MineStudio: A Streamlined Package for Minecraft AI Agent Development | Dec 24, 2024 | AI AgentDecision Making | CodeCode Available | 3 |
| Reinforcement Learning Meets Visual Odometry | Jul 22, 2024 | Decision Makingreinforcement-learning | CodeCode Available | 3 |
| Web-Shepherd: Advancing PRMs for Reinforcing Web Agents | May 21, 2025 | Large Language ModelMultimodal Large Language Model | CodeCode Available | 2 |
| AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO | Feb 20, 2025 | Autonomous NavigationNavigate | CodeCode Available | 2 |
| MacroHFT: Memory Augmented Context-aware Reinforcement Learning On High Frequency Trading | Jun 20, 2024 | Algorithmic TradingDecision Making | CodeCode Available | 2 |
| Deep Generative Models for Offline Policy Learning: Tutorial, Survey, and Perspectives on Future Directions | Feb 21, 2024 | Decision MakingImitation Learning | CodeCode Available | 2 |
| Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent | Feb 15, 2024 | AllDecision Making | CodeCode Available | 2 |
| STEVE-1: A Generative Model for Text-to-Behavior in Minecraft | Jun 1, 2023 | Decision MakingImage Generation | CodeCode Available | 2 |
| Trieste: Efficiently Exploring The Depths of Black-box Functions with TensorFlow | Feb 16, 2023 | Active LearningBayesian Optimization | CodeCode Available | 2 |
| ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency | Nov 29, 2022 | Decision MakingMulti-agent Reinforcement Learning | CodeCode Available | 2 |
| Dungeons and Data: A Large-Scale NetHack Dataset | Nov 1, 2022 | Decision MakingNetHack | CodeCode Available | 2 |
| Multi-Agent Reinforcement Learning is a Sequence Modeling Problem | May 30, 2022 | Decision MakingMuJoCo | CodeCode Available | 2 |
| Pre-Trained Language Models for Interactive Decision-Making | Feb 3, 2022 | Decision MakingImitation Learning | CodeCode Available | 2 |
| Large Language Models for Planning: A Comprehensive and Systematic Survey | May 26, 2025 | Logical ReasoningNavigate | CodeCode Available | 1 |
| LLINBO: Trustworthy LLM-in-the-Loop Bayesian Optimization | May 20, 2025 | Bayesian OptimizationGaussian Processes | CodeCode Available | 1 |
| Comparing Exploration-Exploitation Strategies of LLMs and Humans: Insights from Standard Multi-armed Bandit Tasks | May 15, 2025 | Decision MakingDecision Making Under Uncertainty | CodeCode Available | 1 |
| On Generalization Across Environments In Multi-Objective Reinforcement Learning | Mar 2, 2025 | Decision MakingMulti-Objective Reinforcement Learning | CodeCode Available | 1 |
| Reinforcement learning with combinatorial actions for coupled restless bandits | Mar 1, 2025 | reinforcement-learningReinforcement Learning | CodeCode Available | 1 |
| Training a Generally Curious Agent | Feb 24, 2025 | Decision MakingEfficient Exploration | CodeCode Available | 1 |
| Co-Activation Graph Analysis of Safety-Verified and Explainable Deep Reinforcement Learning Policies | Jan 6, 2025 | Decision MakingDeep Reinforcement Learning | CodeCode Available | 1 |
| LogiCity: Advancing Neuro-Symbolic AI with Abstract Urban Simulation | Nov 1, 2024 | Logical ReasoningSequential Decision Making | CodeCode Available | 1 |
| DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback | Oct 8, 2024 | MathSequential Decision Making | CodeCode Available | 1 |
| Learning Discrete World Models for Heuristic Search | Sep 14, 2024 | Deep Reinforcement LearningHeuristic Search | CodeCode Available | 1 |
| RELIEF: Reinforcement Learning Empowered Graph Feature Prompt Tuning | Aug 6, 2024 | Combinatorial OptimizationGraph Neural Network | CodeCode Available | 1 |
| Re-ReST: Reflection-Reinforced Self-Training for Language Agents | Jun 3, 2024 | Code GenerationImage Generation | CodeCode Available | 1 |
| Pursuing Overall Welfare in Federated Learning through Sequential Decision Making | May 31, 2024 | Decision MakingFairness | CodeCode Available | 1 |
| Rethinking Transformers in Solving POMDPs | May 27, 2024 | Decision MakingReinforcement Learning (RL) | CodeCode Available | 1 |
| Decision Mamba: Reinforcement Learning via Sequence Modeling with Selective State Spaces | Mar 29, 2024 | Decision MakingMamba | CodeCode Available | 1 |
| Reinforced Sequential Decision-Making for Sepsis Treatment: The POSNEGDM Framework with Mortality Classifier and Transformer | Mar 12, 2024 | Decision MakingSequential Decision Making | CodeCode Available | 1 |
| TRAD: Enhancing LLM Agents with Step-Wise Thought Retrieval and Aligned Decision | Mar 10, 2024 | Language ModellingLarge Language Model | CodeCode Available | 1 |
| How Can LLM Guide RL? A Value-Based Approach | Feb 25, 2024 | Decision MakingReinforcement Learning (RL) | CodeCode Available | 1 |
| PRISE: LLM-Style Sequence Compression for Learning Temporal Action Abstractions in Control | Feb 16, 2024 | continuous-controlContinuous Control | CodeCode Available | 1 |
| Premier-TACO is a Few-Shot Policy Learner: Pretraining Multitask Representation via Temporal Action-Driven Contrastive Loss | Feb 9, 2024 | Computational Efficiencycontinuous-control | CodeCode Available | 1 |
| Sym-Q: Adaptive Symbolic Regression via Sequential Decision-Making | Feb 7, 2024 | Decision Makingregression | CodeCode Available | 1 |
| Skill Set Optimization: Reinforcing Language Model Behavior via Transferable Skills | Feb 5, 2024 | Decision MakingLanguage Modeling | CodeCode Available | 1 |
| Layered and Staged Monte Carlo Tree Search for SMT Strategy Synthesis | Jan 30, 2024 | Decision MakingEfficient Exploration | CodeCode Available | 1 |
| LLF-Bench: Benchmark for Interactive Learning from Language Feedback | Dec 11, 2023 | Information RetrievalOpenAI Gym | CodeCode Available | 1 |
| Can language agents be alternatives to PPO? A Preliminary Empirical Study On OpenAI Gym | Dec 6, 2023 | BenchmarkingDecision Making | CodeCode Available | 1 |
| Large Language Model as a Policy Teacher for Training Reinforcement Learning Agents | Nov 22, 2023 | Decision MakingLanguage Modeling | CodeCode Available | 1 |
| Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning | Oct 30, 2023 | Decision MakingOffline RL | CodeCode Available | 1 |
| Out of the Cage: How Stochastic Parrots Win in Cyber Security Environments | Aug 23, 2023 | CyberBattleSimCyberBattleSim (RL) chain scenario | CodeCode Available | 1 |
| Breadcrumbs to the Goal: Goal-Conditioned Exploration from Human-in-the-Loop Feedback | Jul 20, 2023 | Decision Makingreinforcement-learning | CodeCode Available | 1 |
| ContainerGym: A Real-World Reinforcement Learning Benchmark for Resource Allocation | Jul 6, 2023 | Decision MakingDeep Reinforcement Learning | CodeCode Available | 1 |
| Sampling from Gaussian Process Posteriors using Stochastic Gradient Descent | Jun 20, 2023 | Bayesian OptimizationDecision Making | CodeCode Available | 1 |
| Simplified Temporal Consistency Reinforcement Learning | Jun 15, 2023 | Decision Makingreinforcement-learning | CodeCode Available | 1 |
| Decision Stacks: Flexible Reinforcement Learning via Modular Generative Models | Jun 9, 2023 | Decision Makingreinforcement-learning | CodeCode Available | 1 |
| Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning Approach | Jun 6, 2023 | Decision MakingSequential Decision Making | CodeCode Available | 1 |