SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 2130 of 655 papers

TitleStatusHype
Langevin Monte Carlo for Contextual BanditsCode1
Langevin Soft Actor-Critic: Efficient Exploration through Uncertainty-Driven Critic LearningCode1
Neural Exploitation and Exploration of Contextual BanditsCode1
Neural Thompson SamplingCode1
Optimizing Posterior Samples for Bayesian Optimization via RootfindingCode1
Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte CarloCode1
Sample-Then-Optimize Batch Neural Thompson SamplingCode1
Adaptive Gating for Single-Photon 3D Imaging0
A Combinatorial Semi-Bandit Approach to Charging Station Selection for Electric Vehicles0
A Closer Look at the Worst-case Behavior of Multi-armed Bandit Algorithms0
Show:102550
← PrevPage 3 of 66Next →

No leaderboard results yet.