| BOTS: Batch Bayesian Optimization of Extended Thompson Sampling for Severely Episode-Limited RL Settings | Nov 30, 2024 | Bayesian OptimizationPolicy Gradient Methods | —Unverified | 0 |
| Solving Rubik's Cube Without Tricky Sampling | Nov 29, 2024 | Policy Gradient MethodsReinforcement Learning (RL) | —Unverified | 0 |
| Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning | Nov 7, 2024 | Offline RLPolicy Gradient Methods | —Unverified | 0 |
| Policy Gradient for Robust Markov Decision Processes | Oct 29, 2024 | Policy Gradient Methods | CodeCode Available | 0 |
| Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach | Oct 17, 2024 | Policy Gradient MethodsReinforcement Learning (RL) | —Unverified | 0 |
| Learning in complex action spaces without policy gradients | Oct 8, 2024 | Policy Gradient MethodsQ-Learning | —Unverified | 0 |
| Strongly-polynomial time and validation analysis of policy gradient methods | Sep 28, 2024 | Policy Gradient MethodsReinforcement Learning (RL) | —Unverified | 0 |
| Landscape of Policy Optimization for Finite Horizon MDPs with General State and Action | Sep 25, 2024 | Policy Gradient Methods | —Unverified | 0 |
| Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form | Aug 29, 2024 | FormPolicy Gradient Methods | CodeCode Available | 0 |
| Reinforcement Learning for Causal Discovery without Acyclicity Constraints | Aug 24, 2024 | Causal DiscoveryEfficient Exploration | —Unverified | 0 |