| Continuous control with deep reinforcement learning | Sep 9, 2015 | Action Detectioncontinuous-control | CodeCode Available | 1 |
| Deep Recurrent Q-Learning for Partially Observable MDPs | Jul 23, 2015 | Atari GamesDeep Reinforcement Learning | CodeCode Available | 1 |
| Playing Atari with Deep Reinforcement Learning | Dec 19, 2013 | Atari GamesDeep Reinforcement Learning | CodeCode Available | 1 |
| Evaluating Reinforcement Learning Algorithms for Navigation in Simulated Robotic Quadrupeds: A Comparative Study Inspired by Guide Dog Behaviour | Jul 17, 2025 | Autonomous NavigationQ-Learning | —Unverified | 0 |
| Personalized Exercise Recommendation with Semantically-Grounded Knowledge Tracing | Jul 15, 2025 | Knowledge TracingMath | CodeCode Available | 0 |
| A Data-Ensemble-Based Approach for Sample-Efficient LQ Control of Linear Time-Varying Systems | Jun 30, 2025 | Q-Learning | —Unverified | 0 |
| ADDQ: Adaptive Distributional Double Q-Learning | Jun 24, 2025 | Distributional Reinforcement LearningMuJoCo | CodeCode Available | 0 |
| Reinforcement Learning-Based Policy Optimisation For Heterogeneous Radio Access | Jun 18, 2025 | Q-Learningreinforcement-learning | —Unverified | 0 |
| ReinDSplit: Reinforced Dynamic Split Learning for Pest Recognition in Precision Agriculture | Jun 16, 2025 | Q-LearningReinforcement Learning (RL) | —Unverified | 0 |
| Implicit Constraint-Aware Off-Policy Correction for Offline Reinforcement Learning | Jun 16, 2025 | Q-Learning | —Unverified | 0 |
| "What are my options?": Explaining RL Agents with Diverse Near-Optimal Alternatives (Extended) | Jun 11, 2025 | DiversityQ-Learning | —Unverified | 0 |
| Q-learning-based Hierarchical Cooperative Local Search for Steelmaking-continuous Casting Scheduling Problem | Jun 10, 2025 | Q-LearningScheduling | —Unverified | 0 |
| Regret-Optimal Q-Learning with Low Cost for Single-Agent and Federated Reinforcement Learning | Jun 5, 2025 | Q-LearningReinforcement Learning (RL) | —Unverified | 0 |
| Bridging the Performance Gap Between Target-Free and Target-Based Reinforcement Learning With Iterated Q-Learning | Jun 4, 2025 | Q-Learning | —Unverified | 0 |
| Improving Performance of Spike-based Deep Q-Learning using Ternary Neurons | Jun 3, 2025 | Atari GamesDecision Making | —Unverified | 0 |
| Reinforcement Learning for Hanabi | May 31, 2025 | Card GamesDeep Reinforcement Learning | —Unverified | 0 |
| Entropic Risk Optimization in Discounted MDPs: Sample Complexity Bounds with a Generative Model | May 30, 2025 | Q-Learning | —Unverified | 0 |
| On Global Convergence Rates for Federated Policy Gradient under Heterogeneous Environment | May 29, 2025 | Federated LearningPolicy Gradient Methods | —Unverified | 0 |
| Learning to Charge More: A Theoretical Study of Collusion by Q-Learning Agents | May 28, 2025 | Q-Learning | —Unverified | 0 |
| BOFormer: Learning to Solve Multi-Objective Bayesian Optimization via Non-Markovian RL | May 28, 2025 | Bayesian OptimizationHyperparameter Optimization | —Unverified | 0 |
| A General-Purpose Theorem for High-Probability Bounds of Stochastic Approximation with Polyak Averaging | May 27, 2025 | Q-Learning | —Unverified | 0 |
| Inverse Q-Learning Done Right: Offline Imitation Learning in Q^π-Realizable MDPs | May 26, 2025 | Imitation LearningQ-Learning | CodeCode Available | 0 |
| Distributionally Robust Deep Q-Learning | May 25, 2025 | Q-Learning | CodeCode Available | 0 |
| Reinforcement Learning for Stock Transactions | May 22, 2025 | Q-Learningreinforcement-learning | —Unverified | 0 |
| Offline Guarded Safe Reinforcement Learning for Medical Treatment Optimization Strategies | May 22, 2025 | Offline RLQ-Learning | —Unverified | 0 |
| OPA-Pack: Object-Property-Aware Robotic Bin Packing | May 19, 2025 | ObjectQ-Learning | —Unverified | 0 |
| When a Reinforcement Learning Agent Encounters Unknown Unknowns | May 19, 2025 | AI AgentQ-Learning | —Unverified | 0 |
| Imagination-Limited Q-Learning for Offline Reinforcement Learning | May 18, 2025 | D4RLQ-Learning | —Unverified | 0 |
| Automatic Reward Shaping from Confounded Offline Data | May 16, 2025 | Atari GamesDeep Reinforcement Learning | —Unverified | 0 |
| ShiQ: Bringing back Bellman to LLMs | May 16, 2025 | Q-LearningReinforcement Learning (RL) | —Unverified | 0 |
| Bias or Optimality? Disentangling Bayesian Inference and Learning Biases in Human Decision-Making | May 12, 2025 | Bayesian InferenceDecision Making | —Unverified | 0 |
| Convert Language Model into a Value-based Strategic Planner | May 11, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Universal Approximation Theorem for Deep Q-Learning via FBSDE System | May 9, 2025 | Q-Learning | —Unverified | 0 |
| A Large Language Model-Enhanced Q-learning for Capacitated Vehicle Routing Problem with Time Windows | May 9, 2025 | Combinatorial OptimizationLanguage Modeling | —Unverified | 0 |
| A critical assessment of reinforcement learning methods for microswimmer navigation in complex flows | May 8, 2025 | Autonomous NavigationHyperparameter Optimization | CodeCode Available | 0 |
| Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation | May 7, 2025 | DisentanglementLightweight Deployment | —Unverified | 0 |
| VLM Q-Learning: Aligning Vision-Language Models for Interactive Decision-Making | May 6, 2025 | Decision MakingGeneral Knowledge | —Unverified | 0 |
| Meta-Black-Box-Optimization through Offline Q-function Learning | May 4, 2025 | BenchmarkingMamba | CodeCode Available | 0 |
| Universal Approximation Theorem of Deep Q-Networks | May 4, 2025 | Deep Reinforcement LearningQ-Learning | —Unverified | 0 |
| Rank-One Modified Value Iteration | May 3, 2025 | Q-Learning | —Unverified | 0 |
| Dynamic and Distributed Routing in IoT Networks based on Multi-Objective Q-Learning | May 1, 2025 | Q-Learning | —Unverified | 0 |
| Learning Neural Control Barrier Functions from Offline Data with Conservatism | May 1, 2025 | Q-Learning | —Unverified | 0 |
| Q-Learning with Clustered-SMART (cSMART) Data: Examining Moderators in the Construction of Clustered Adaptive Interventions | May 1, 2025 | Q-Learning | —Unverified | 0 |
| Interactive Double Deep Q-network: Integrating Human Interventions and Evaluative Predictions in Reinforcement Learning of Autonomous Driving | Apr 28, 2025 | Autonomous DrivingQ-Learning | —Unverified | 0 |
| Non-Asymptotic Guarantees for Average-Reward Q-Learning with Adaptive Stepsizes | Apr 25, 2025 | Q-Learning | —Unverified | 0 |
| SAPO-RL: Sequential Actuator Placement Optimization for Fuselage Assembly via Reinforcement Learning | Apr 24, 2025 | Decision MakingQ-Learning | —Unverified | 0 |
| Mixed-Precision Conjugate Gradient Solvers with RL-Driven Precision Tuning | Apr 19, 2025 | Computational EfficiencyQ-Learning | —Unverified | 0 |
| Understanding the theoretical properties of projected Bellman equation, linear Q-learning, and approximate value iteration | Apr 15, 2025 | Q-Learning | —Unverified | 0 |
| Nash Equilibrium Between Consumer Electronic Devices and DoS Attacker for Distributed IoT-enabled RSE Systems | Apr 13, 2025 | Q-LearningState Estimation | —Unverified | 0 |
| A Framework of decision-relevant observability: Reinforcement Learning converges under relative ignorability | Apr 10, 2025 | Causal InferenceDecision Making | —Unverified | 0 |