SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 576600 of 655 papers

TitleStatusHype
An Information-Theoretic Analysis for Thompson Sampling with Many Actions0
An Information-Theoretic Analysis of Thompson Sampling0
An Information-Theoretic Analysis of Thompson Sampling for Logistic Bandits0
An Information-Theoretic Analysis of Thompson Sampling with Infinite Action Spaces0
An Online Learning Framework for Energy-Efficient Navigation of Electric Vehicles0
A Nonparametric Contextual Bandit with Arm-level Eligibility Control for Customer Service Routing0
Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits0
A Note on Information-Directed Sampling and Thompson Sampling0
An Unbiased Data Collection and Content Exploitation/Exploration Strategy for Personalization0
Apple Tasting Revisited: Bayesian Approaches to Partially Monitored Online Binary Classification0
Approximate information for efficient exploration-exploitation strategies0
Approximate Thompson Sampling for Learning Linear Quadratic Regulators with O(T) Regret0
A Practical Method for Solving Contextual Bandit Problems Using Decision Trees0
A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning0
Efficiently Tackling Million-Dimensional Multiobjective Problems: A Direction Sampling and Fine-Tuning Approach0
A Reinforcement Learning based Reset Policy for CDCL SAT Solvers0
A relaxed technical assumption for posterior sampling-based reinforcement learning for control of unknown linear systems0
A Reliability-aware Multi-armed Bandit Approach to Learn and Select Users in Demand Response0
A resource-constrained stochastic scheduling algorithm for homeless street outreach and gleaning edible food0
A sequential Monte Carlo approach to Thompson sampling for Bayesian optimization0
A Simple and Optimal Policy Design with Safety against Heavy-Tailed Risk for Stochastic Bandits0
A study of Thompson Sampling with Parameter h0
Asymptotically Optimal Algorithms for Budgeted Multiple Play Bandits0
Asymptotically Optimal Bandits under Weighted Information0
Asymptotically Optimal Linear Best Feasible Arm Identification with Fixed Budget0
Show:102550
← PrevPage 24 of 27Next →

No leaderboard results yet.