SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 6170 of 655 papers

TitleStatusHype
Active RLHF via Best Policy Learning from Trajectory Preference Feedback0
FedRTS: Federated Robust Pruning via Combinatorial Thompson SamplingCode0
EVaDE : Event-Based Variational Thompson Sampling for Model-Based Reinforcement Learning0
Truthful mechanisms for linear bandit games with private contexts0
Stochastically Constrained Best Arm Identification with Thompson Sampling0
WAPTS: A Weighted Allocation Probability Adjusted Thompson Sampling Algorithm for High-Dimensional and Sparse Experiment Settings0
On Improved Regret Bounds In Bayesian Optimization with Gaussian Noise0
Generalized Bayesian deep reinforcement learning0
An Information-Theoretic Analysis of Thompson Sampling for Logistic Bandits0
BOTS: Batch Bayesian Optimization of Extended Thompson Sampling for Severely Episode-Limited RL Settings0
Show:102550
← PrevPage 7 of 66Next →

No leaderboard results yet.