SOTAVerified

Sequential Decision Making

Papers

Showing 10011050 of 1210 papers

TitleStatusHype
Thompson Sampling for Contextual Bandit Problems with Auxiliary Safety Constraints0
Thompson Sampling via Local UncertaintyCode0
Policy Learning for Malaria ControlCode0
Adaptive Exploration in Linear Contextual Bandit0
Deep Q-Network for Angry BirdsCode0
MABWiser: A Parallelizable Contextual Multi-Armed Bandit Library for PythonCode0
The Choice Function Framework for Online Policy Improvement0
Reinforcement Learning for Multi-Objective Optimization of Online Decisions in High-Dimensional Systems0
Generalizing Reinforcement Learning to Unseen Actions0
Collaborative Inter-agent Knowledge Distillation for Reinforcement Learning0
Learning Functionally Decomposed Hierarchies for Continuous Navigation Tasks0
PROVABLY BENEFITS OF DEEP HIERARCHICAL RL0
Selective Network Discovery via Deep Reinforcement Learning on Embedded Spaces0
Back to the Future -- Sequential Alignment of Text RepresentationsCode0
Classification with Costly Features as a Sequential Decision-Making ProblemCode0
An Arm-Wise Randomization Approach to Combinatorial Linear Semi-Bandits0
Prediction, Consistency, Curvature: Representation Learning for Locally-Linear ControlCode0
Can A User Anticipate What Her Followers Want?0
Interactive Machine Comprehension with Information Seeking AgentsCode0
Reinforcement Learning in Healthcare: A Survey0
Exploring Offline Policy Evaluation for the Continuous-Armed Bandit Problem0
Online Planning for Decentralized Stochastic Control with Partial History Sharing0
Bridging Commonsense Reasoning and Probabilistic Planning via a Probabilistic Action Language0
Reward Learning for Efficient Reinforcement Learning in Extractive Document SummarisationCode0
Bandit Convex Optimization in Non-stationary Environments0
Scaling Multi-Armed Bandit Algorithms0
IR-VIC: Unsupervised Discovery of Sub-goals for Transfer in RL0
A Sufficient Statistic for Influence in Structured Multiagent Environments0
Reward Advancement: Transforming Policy under Maximum Causal Entropy Principle0
A Scheme for Dynamic Risk-Sensitive Sequential Decision Making0
Thompson Sampling on Symmetric α-Stable Bandits0
Co-training for Policy LearningCode0
Bridging by Word: Image Grounded Vocabulary Construction for Visual CaptioningCode0
Exploiting Relevance for Online Decision-Making in High-Dimensions0
Learning Markov models via low-rank optimization0
A Theoretical Connection Between Statistical Physics and Reinforcement Learning0
A Hierarchical Architecture for Sequential Decision-Making in Autonomous Driving using Deep Reinforcement LearningCode0
Macro-action Multi-time scale Dynamic Programming for Energy Management in Buildings with Phase Change Materials0
Neural Heterogeneous Scheduler0
Non-Stationary Reinforcement Learning: The Blessing of (More) Optimism0
Lifelong Learning with a Changing Action SetCode0
Reinforcement Learning When All Actions are Not Always AvailableCode0
Learning NP-Hard Multi-Agent Assignment Planning using GNN: Inference on a Random Graph and Provable Auction-Fitted Q-learning0
Learning to Discretize: Solving 1D Scalar Conservation Laws via Deep Reinforcement LearningCode0
Multi-hop Reading Comprehension via Deep Reinforcement Learning based Document TraversalCode0
Knowledge-Based Sequential Decision-Making Under Uncertainty0
Tight Regret Bounds for Infinite-armed Linear Contextual Bandits0
Group Retention when Using Machine Learning in Sequential Decision Making: the Interplay between User Dynamics and Fairness0
Understanding & Generalizing AlphaGo Zero0
Soft Q-Learning with Mutual-Information Regularization0
Show:102550
← PrevPage 21 of 25Next →

No leaderboard results yet.