SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 151200 of 655 papers

TitleStatusHype
Chimera: A Hybrid Machine Learning Driven Multi-Objective Design Space Exploration Tool for FPGA High-Level Synthesis0
Code Repair with LLMs gives an Exploration-Exploitation Tradeoff0
Bayesian Analysis of Combinatorial Gaussian Process Bandits0
Combinatorial Multi-armed Bandits: Arm Selection via Group Testing0
BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems0
Combinatorial Neural Bandits0
Combining Bayesian Optimization and Lipschitz Optimization0
Concurrent Decentralized Channel Allocation and Access Point Selection using Multi-Armed Bandits in multi BSS WLANs0
Connecting Thompson Sampling and UCB: Towards More Efficient Trade-offs Between Privacy and Regret0
Connections Between Mirror Descent, Thompson Sampling and the Information Ratio0
Constrained Contextual Bandit Learning for Adaptive Radar Waveform Selection0
Constrained Thompson Sampling for Real-Time Electricity Pricing with Grid Reliability Constraints0
Constrained Thompson Sampling for Wireless Link Optimization0
A Reinforcement Learning based Reset Policy for CDCL SAT Solvers0
A relaxed technical assumption for posterior sampling-based reinforcement learning for control of unknown linear systems0
Context Attentive Bandits: Contextual Bandit with Restricted Context0
Context Attribution with Multi-Armed Bandit Optimization0
Adaptive Portfolio by Solving Multi-armed Bandit via Thompson Sampling0
Contextual Bandits for Advertising Budget Allocation0
Contextual Bandits with Non-Stationary Correlated Rewards for User Association in MmWave Vehicular Networks0
Contextual Bandit with Herding Effects: Algorithms and Recommendation Applications0
Contextual Multi-armed Bandit Algorithm for Semiparametric Reward Model0
Contextual Multi-Armed Bandits for Causal Marketing0
Contextual Thompson Sampling via Generation of Missing Data0
Convergence Rates of Posterior Distributions in Markov Decision Process0
Convolutional Monte Carlo Rollouts in Go0
Cost Aware Asynchronous Multi-Agent Active Search0
Cost-efficient Knowledge-based Question Answering with Large Language Models0
Asymptotically Optimal Bandits under Weighted Information0
Counterfactual Data-Fusion for Online Reinforcement Learners0
Counterfactual Inference under Thompson Sampling0
Towards Efficient and Optimal Covariance-Adaptive Algorithms for Combinatorial Semi-Bandits0
Cover Tree Bayesian Reinforcement Learning0
Customized Nonlinear Bandits for Online Response Selection in Neural Conversation Models0
Asymptotic Convergence of Thompson Sampling0
Debiasing Samples from Online Learning Using Bootstrap0
Decentralized Multi-Agent Active Search and Tracking when Targets Outnumber Agents0
Deciding What to Learn: A Rate-Distortion Approach0
Deconfounded Warm-Start Thompson Sampling with Applications to Precision Medicine0
Deep Active Ensemble Sampling For Image Classification0
Bayesian Quantile and Expectile Optimisation0
An Information-Theoretic Analysis of Thompson Sampling for Logistic Bandits0
Deep Contextual Multi-armed Bandits0
Deep Exploration for Recommendation Systems0
Deep Hierarchy in Bandits0
Delay-Adaptive Learning in Generalized Linear Contextual Bandits0
Adaptively Optimize Content Recommendation Using Multi Armed Bandit Algorithms in E-commerce0
Differentially Private Federated Bayesian Optimization with Distributed Exploration0
Diffusion Approximations for Thompson Sampling0
A Copula approach for hyperparameter transfer learning0
Show:102550
← PrevPage 4 of 14Next →

No leaderboard results yet.