| AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air | Jul 15, 2025 | DenoisingSequential Decision Making | —Unverified | 0 |
| LLM-Stackelberg Games: Conjectural Reasoning Equilibria and Their Applications to Spearphishing | Jul 12, 2025 | Decision MakingMisinformation | —Unverified | 0 |
| A Survey of Continual Reinforcement Learning | Jun 27, 2025 | Continual LearningDecision Making | —Unverified | 0 |
| Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning | Jun 26, 2025 | Action GenerationDecision Making | —Unverified | 0 |
| POLAR: A Pessimistic Model-based Policy Learning Algorithm for Dynamic Treatment Regimes | Jun 25, 2025 | Sequential Decision Making | —Unverified | 0 |
| Efficient Strategy Synthesis for MDPs via Hierarchical Block Decomposition | Jun 21, 2025 | Decision MakingSequential Decision Making | —Unverified | 0 |
| UProp: Investigating the Uncertainty Propagation of LLMs in Multi-Step Agentic Decision-Making | Jun 20, 2025 | Decision MakingQuestion Answering | CodeCode Available | 0 |
| Multi-Armed Bandits With Machine Learning-Generated Surrogate Rewards | Jun 20, 2025 | Decision Making Under UncertaintyMulti-Armed Bandits | —Unverified | 0 |
| Adaptive Action Duration with Contextual Bandits for Deep Reinforcement Learning in Dynamic Environments | Jun 17, 2025 | Atari GamesBoard Games | CodeCode Available | 0 |
| Common Benchmarks Undervalue the Generalization Power of Programmatic Policies | Jun 17, 2025 | Sequential Decision Making | CodeCode Available | 0 |
| Leveraging In-Context Learning for Language Model Agents | Jun 16, 2025 | In-Context LearningLanguage Modeling | —Unverified | 0 |
| Revisiting Clustering of Neural Bandits: Selective Reinitialization for Mitigating Loss of Plasticity | Jun 14, 2025 | Change DetectionClustering | —Unverified | 0 |
| Towards Responsible AI: Advances in Safety, Fairness, and Accountability of Autonomous Systems | Jun 11, 2025 | Autonomous VehiclesDecision Making | —Unverified | 0 |
| TooBadRL: Trigger Optimization to Boost Effectiveness of Backdoor Attacks on Deep Reinforcement Learning | Jun 11, 2025 | Deep Reinforcement LearningSequential Decision Making | CodeCode Available | 0 |
| How to Provably Improve Return Conditioned Supervised Learning? | Jun 10, 2025 | Decision MakingOffline RL | —Unverified | 0 |
| QForce-RL: Quantized FPGA-Optimized Reinforcement Learning Compute Engine | Jun 8, 2025 | Decision MakingQuantization | —Unverified | 0 |
| Contextual Experience Replay for Self-Improvement of Language Agents | Jun 7, 2025 | Decision MakingLarge Language Model | —Unverified | 0 |
| AutoQD: Automatic Discovery of Diverse Behaviors with Quality-Diversity Optimization | Jun 5, 2025 | continuous-controlContinuous Control | —Unverified | 0 |
| TextAtari: 100K Frames Game Playing with Language Agents | Jun 4, 2025 | Atari GamesDecision Making | CodeCode Available | 0 |
| Active Layer-Contrastive Decoding Reduces Hallucination in Large Language Model Generation | May 29, 2025 | Decision MakingHallucination | —Unverified | 0 |
| Emergent Risk Awareness in Rational Agents under Resource Constraints | May 29, 2025 | Sequential Decision Making | —Unverified | 0 |
| Adaptive Frontier Exploration on Graphs with Applications to Network-Based Disease Testing | May 27, 2025 | Sequential Decision Making | —Unverified | 0 |
| Variational Deep Learning via Implicit Regularization | May 26, 2025 | Deep LearningInductive Bias | —Unverified | 0 |
| Large Language Models for Planning: A Comprehensive and Systematic Survey | May 26, 2025 | Logical ReasoningNavigate | CodeCode Available | 1 |
| DDO: Dual-Decision Optimization via Multi-Agent Collaboration for LLM-Based Medical Consultation | May 24, 2025 | Decision MakingSequential Decision Making | —Unverified | 0 |
| Automata Learning of Preferences over Temporal Logic Formulas from Pairwise Comparisons | May 23, 2025 | Motion PlanningSequential Decision Making | —Unverified | 0 |
| Reward Is Enough: LLMs Are In-Context Reinforcement Learners | May 21, 2025 | Large Language ModelReinforcement Learning (RL) | —Unverified | 0 |
| Web-Shepherd: Advancing PRMs for Reinforcing Web Agents | May 21, 2025 | Large Language ModelMultimodal Large Language Model | CodeCode Available | 2 |
| Sample and Computationally Efficient Continuous-Time Reinforcement Learning with General Function Approximation | May 20, 2025 | Computational Efficiencycontinuous-control | CodeCode Available | 0 |
| LLINBO: Trustworthy LLM-in-the-Loop Bayesian Optimization | May 20, 2025 | Bayesian OptimizationGaussian Processes | CodeCode Available | 1 |
| Vid2World: Crafting Video Diffusion Models to Interactive World Models | May 20, 2025 | Robot ManipulationSequential Decision Making | —Unverified | 0 |
| OMGPT: A Sequence Modeling Framework for Data-driven Operational Decision Making | May 19, 2025 | Decision MakingManagement | —Unverified | 0 |
| Generalization Guarantees for Learning Branch-and-Cut Policies in Integer Programming | May 16, 2025 | Sequential Decision MakingVariable Selection | —Unverified | 0 |
| Deep Symbolic Optimization: Reinforcement Learning for Symbolic Mathematics | May 16, 2025 | Equation Discoveryreinforcement-learning | —Unverified | 0 |
| Batched Nonparametric Bandits via k-Nearest Neighbor UCB | May 15, 2025 | Decision MakingMarketing | —Unverified | 0 |
| Comparing Exploration-Exploitation Strategies of LLMs and Humans: Insights from Standard Multi-armed Bandit Tasks | May 15, 2025 | Decision MakingDecision Making Under Uncertainty | CodeCode Available | 1 |
| Counterfactual Strategies for Markov Decision Processes | May 14, 2025 | counterfactualDecision Making | —Unverified | 0 |
| Sequential Treatment Effect Estimation with Unmeasured Confounders | May 14, 2025 | counterfactualSequential Decision Making | —Unverified | 0 |
| rfPG: Robust Finite-Memory Policy Gradients for Hidden-Model POMDPs | May 14, 2025 | Decision Making Under UncertaintySequential Decision Making | —Unverified | 0 |
| A Practical Introduction to Deep Reinforcement Learning | May 13, 2025 | Autonomous DrivingDecision Making | —Unverified | 0 |
| Explainable Reinforcement Learning Agents Using World Models | May 12, 2025 | counterfactualreinforcement-learning | —Unverified | 0 |
| A Multi-Agent Reinforcement Learning Approach for Cooperative Air-Ground-Human Crowdsensing in Emergency Rescue | May 11, 2025 | Decision Making Under UncertaintyMulti-agent Reinforcement Learning | —Unverified | 0 |
| Constrained Online Decision-Making: A Unified Framework | May 11, 2025 | Active Learningcounterfactual | —Unverified | 0 |
| RL-DAUNCE: Reinforcement Learning-Driven Data Assimilation with Uncertainty-Aware Constrained Ensembles | May 8, 2025 | Computational EfficiencyReinforcement Learning (RL) | —Unverified | 0 |
| Active Sampling for MRI-based Sequential Decision Making | May 7, 2025 | Decision MakingDiagnostic | CodeCode Available | 0 |
| Policy-labeled Preference Learning: Is Preference Enough for RLHF? | May 6, 2025 | continuous-controlContinuous Control | —Unverified | 0 |
| MDPs with a State Sensing Cost | May 6, 2025 | Sequential Decision Making | —Unverified | 0 |
| D3HRL: A Distributed Hierarchical Reinforcement Learning Approach Based on Causal Discovery and Spurious Correlation Detection | May 4, 2025 | Causal DiscoveryDecision Making | —Unverified | 0 |
| Bayesian learning of the optimal action-value function in a Markov decision process | May 3, 2025 | Decision MakingSequential Decision Making | —Unverified | 0 |
| A Minimax-MDP Framework with Future-imposed Conditions for Learning-augmented Problems | May 2, 2025 | Decision MakingPrediction Intervals | —Unverified | 0 |