| Automata Learning of Preferences over Temporal Logic Formulas from Pairwise Comparisons | May 23, 2025 | Motion PlanningSequential Decision Making | —Unverified | 0 |
| Reward Is Enough: LLMs Are In-Context Reinforcement Learners | May 21, 2025 | Large Language ModelReinforcement Learning (RL) | —Unverified | 0 |
| Web-Shepherd: Advancing PRMs for Reinforcing Web Agents | May 21, 2025 | Large Language ModelMultimodal Large Language Model | CodeCode Available | 2 |
| Sample and Computationally Efficient Continuous-Time Reinforcement Learning with General Function Approximation | May 20, 2025 | Computational Efficiencycontinuous-control | CodeCode Available | 0 |
| LLINBO: Trustworthy LLM-in-the-Loop Bayesian Optimization | May 20, 2025 | Bayesian OptimizationGaussian Processes | CodeCode Available | 1 |
| Vid2World: Crafting Video Diffusion Models to Interactive World Models | May 20, 2025 | Robot ManipulationSequential Decision Making | —Unverified | 0 |
| OMGPT: A Sequence Modeling Framework for Data-driven Operational Decision Making | May 19, 2025 | Decision MakingManagement | —Unverified | 0 |
| Generalization Guarantees for Learning Branch-and-Cut Policies in Integer Programming | May 16, 2025 | Sequential Decision MakingVariable Selection | —Unverified | 0 |
| Deep Symbolic Optimization: Reinforcement Learning for Symbolic Mathematics | May 16, 2025 | Equation Discoveryreinforcement-learning | —Unverified | 0 |
| Batched Nonparametric Bandits via k-Nearest Neighbor UCB | May 15, 2025 | Decision MakingMarketing | —Unverified | 0 |
| Comparing Exploration-Exploitation Strategies of LLMs and Humans: Insights from Standard Multi-armed Bandit Tasks | May 15, 2025 | Decision MakingDecision Making Under Uncertainty | CodeCode Available | 1 |
| Counterfactual Strategies for Markov Decision Processes | May 14, 2025 | counterfactualDecision Making | —Unverified | 0 |
| Sequential Treatment Effect Estimation with Unmeasured Confounders | May 14, 2025 | counterfactualSequential Decision Making | —Unverified | 0 |
| rfPG: Robust Finite-Memory Policy Gradients for Hidden-Model POMDPs | May 14, 2025 | Decision Making Under UncertaintySequential Decision Making | —Unverified | 0 |
| A Practical Introduction to Deep Reinforcement Learning | May 13, 2025 | Autonomous DrivingDecision Making | —Unverified | 0 |
| Explainable Reinforcement Learning Agents Using World Models | May 12, 2025 | counterfactualreinforcement-learning | —Unverified | 0 |
| A Multi-Agent Reinforcement Learning Approach for Cooperative Air-Ground-Human Crowdsensing in Emergency Rescue | May 11, 2025 | Decision Making Under UncertaintyMulti-agent Reinforcement Learning | —Unverified | 0 |
| Constrained Online Decision-Making: A Unified Framework | May 11, 2025 | Active Learningcounterfactual | —Unverified | 0 |
| RL-DAUNCE: Reinforcement Learning-Driven Data Assimilation with Uncertainty-Aware Constrained Ensembles | May 8, 2025 | Computational EfficiencyReinforcement Learning (RL) | —Unverified | 0 |
| Active Sampling for MRI-based Sequential Decision Making | May 7, 2025 | Decision MakingDiagnostic | CodeCode Available | 0 |
| Policy-labeled Preference Learning: Is Preference Enough for RLHF? | May 6, 2025 | continuous-controlContinuous Control | —Unverified | 0 |
| MDPs with a State Sensing Cost | May 6, 2025 | Sequential Decision Making | —Unverified | 0 |
| D3HRL: A Distributed Hierarchical Reinforcement Learning Approach Based on Causal Discovery and Spurious Correlation Detection | May 4, 2025 | Causal DiscoveryDecision Making | —Unverified | 0 |
| Bayesian learning of the optimal action-value function in a Markov decision process | May 3, 2025 | Decision MakingSequential Decision Making | —Unverified | 0 |
| A Minimax-MDP Framework with Future-imposed Conditions for Learning-augmented Problems | May 2, 2025 | Decision MakingPrediction Intervals | —Unverified | 0 |