SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 226250 of 655 papers

TitleStatusHype
Sample Efficient Learning of Factored Embeddings of Tensor Fields0
Causal Bandits for Linear Structural Equation ModelsCode0
Dynamic collaborative filtering Thompson Sampling for cross-domain advertisements recommendation0
A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning0
Non-Stationary Dynamic Pricing Via Actor-Critic Information-Directed Pricing0
Increasing Students' Engagement to Reminder Emails Through Multi-Armed Bandits0
Using Adaptive Experiments to Rapidly Help Students0
Bayesian Optimization-Based Beam Alignment for MmWave MIMO Communication Systems0
SPRT-based Efficient Best Arm Identification in Stochastic Bandits0
Chimera: A Hybrid Machine Learning Driven Multi-Objective Design Space Exploration Tool for FPGA High-Level Synthesis0
Ranking In Generalized Linear BanditsCode0
Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs0
Langevin Monte Carlo for Contextual BanditsCode1
Analysis of Thompson Sampling for Controlling Unknown Linear Diffusion Processes0
Thompson Sampling for (Combinatorial) Pure Exploration0
Thompson Sampling for Robust Transfer in Multi-Task BanditsCode0
Thompson Sampling Achieves O(T) Regret in Linear Quadratic Control0
A Contextual Combinatorial Semi-Bandit Approach to Network Bottleneck Identification0
On Provably Robust Meta-Bayesian OptimizationCode0
Top Two Algorithms Revisited0
Regret Bounds for Information-Directed Reinforcement Learning0
A Simple and Optimal Policy Design with Safety against Heavy-Tailed Risk for Stochastic Bandits0
Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits0
Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization0
Incentivizing Combinatorial Bandit Exploration0
Show:102550
← PrevPage 10 of 27Next →

No leaderboard results yet.