SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 76100 of 655 papers

TitleStatusHype
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback0
Thompson Sampling Itself is Differentially Private0
Scalable Exploration via Ensemble++Code0
DRL-based Joint Resource Scheduling of eMBB and URLLC in O-RAN0
Joint User Association and Pairing in Multi-UAV-Assisted NOMA Networks: A Decaying-Epsilon Thompson Sampling Framework0
Preferential Multi-Objective Bayesian Optimization0
Bayesian Bandit Algorithms with Approximate Inference in Stochastic Linear Bandits0
More Efficient Randomized Exploration for Reinforcement Learning via Approximate SamplingCode0
Memory Sequence Length of Data Sampling Impacts the Adaptation of Meta-Reinforcement Learning Agents0
Improving Reward-Conditioned Policies for Multi-Armed Bandits using Normalized Weight Functions0
Graph Neural Thompson Sampling0
A Federated Online Restless Bandit Framework for Cooperative Resource Allocation0
DISCO: An End-to-End Bandit Framework for Personalised Discount Allocation0
Two-Stage Resource Allocation in Reconfigurable Intelligent Surface Assisted Hybrid Networks via Multi-Player Bandits0
Adaptively Learning to Select-Rank in Online Platforms0
Speculative Decoding via Early-exiting for Faster LLM Inference with Thompson Sampling Control Mechanism0
A Bayesian Approach to Online PlanningCode1
Posterior Sampling via Autoregressive Generation0
Approximate Thompson Sampling for Learning Linear Quadratic Regulators with O(T) Regret0
Cost-efficient Knowledge-based Question Answering with Large Language Models0
Code Repair with LLMs gives an Exploration-Exploitation Tradeoff0
On Bits and Bandits: Quantifying the Regret-Information Trade-offCode0
Indexed Minimum Empirical Divergence-Based Algorithms for Linear Bandits0
No Algorithmic Collusion in Two-Player Blindfolded Game with Thompson Sampling0
Understanding the Training and Generalization of Pretrained Transformer for Sequential Decision Making0
Show:102550
← PrevPage 4 of 27Next →

No leaderboard results yet.