SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 601650 of 655 papers

TitleStatusHype
The Choice of Noninformative Priors for Thompson Sampling in Multiparameter Bandit Models0
Asymptotic Convergence of Thompson Sampling0
Asymptotic Performance of Thompson Sampling in the Batched Multi-Armed Bandits0
Asynchronous Multi Agent Active Search0
Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic Systems0
A Unified and Efficient Coordinating Framework for Autonomous DBMS Tuning0
Automatic Ensemble Learning for Online Influence Maximization0
AutoSeM: Automatic Task Selection and Mixing in Multi-Task Learning0
Bag of Policies for Distributional Deep Exploration0
BanditCAT and AutoIRT: Machine Learning Approaches to Computerized Adaptive Testing and Item Calibration0
Bandit Change-Point Detection for Real-Time Monitoring High-Dimensional Data Under Sampling Control0
Bandit Convex Optimization: sqrtT Regret in One Dimension0
Bandit Learning for Diversified Interactive Recommendation0
Bandit Models of Human Behavior: Reward Processing in Mental Disorders0
Bandit Policies for Reliable Cellular Network Handovers in Extreme Mobility0
Bandits Under The Influence (Extended Version)0
Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization0
Batch Bayesian Optimization for Replicable Experimental Design0
Batched Thompson Sampling0
Batched Thompson Sampling for Multi-Armed Bandits0
Bayesian Bandit Algorithms with Approximate Inference in Stochastic Linear Bandits0
Bayesian Best-Arm Identification for Selecting Influenza Mitigation Strategies0
Bayesian Collaborative Bandits with Thompson Sampling for Improved Outreach in Maternal Health Program0
Bayesian decision-making under misspecified priors with applications to meta-learning0
Bayesian-Guided Generation of Synthetic Microbiomes with Minimized Pathogenicity0
Bayesian Learning of Optimal Policies in Markov Decision Processes with Countably Infinite State-Space0
Bayesian learning of the optimal action-value function in a Markov decision process0
Bayesian Mixture Modelling and Inference based Thompson Sampling in Monte-Carlo Tree Search0
Bayesian Optimization-Based Beam Alignment for MmWave MIMO Communication Systems0
Bayesian Optimization with Inexact Acquisition: Is Random Grid Search Sufficient?0
Bayesian Optimization with LLM-Based Acquisition Functions for Natural Language Preference Elicitation0
Bayesian Quantile and Expectile Optimisation0
BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems0
BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems0
Belief Flows of Robust Online Learning0
Best Arm Identification in Batched Multi-armed Bandit Problems0
Active RLHF via Best Policy Learning from Trajectory Preference Feedback0
Better Optimism By Bayes: Adaptive Planning with Rich Models0
Blind Exploration and Exploitation of Stochastic Experts0
Bootstrapped Thompson Sampling and Deep Exploration0
BOTS: Batch Bayesian Optimization of Extended Thompson Sampling for Severely Episode-Limited RL Settings0
Calibrated Fairness in Bandits0
Causal Bandits without prior knowledge using separating sets0
Chained Information-Theoretic bounds and Tight Regret Rate for Linear Bandit Problems0
Challenges in Statistical Analysis of Data Collected by a Bandit Algorithm: An Empirical Exploration in Applications to Adaptively Randomized Experiments0
Chimera: A Hybrid Machine Learning Driven Multi-Objective Design Space Exploration Tool for FPGA High-Level Synthesis0
Code Repair with LLMs gives an Exploration-Exploitation Tradeoff0
Bayesian Analysis of Combinatorial Gaussian Process Bandits0
Combinatorial Multi-armed Bandits: Arm Selection via Group Testing0
Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms: A Case with Bounded Regret0
Show:102550
← PrevPage 13 of 14Next →

No leaderboard results yet.