SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 201250 of 655 papers

TitleStatusHype
Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits0
Approximate Thompson Sampling via Epistemic Neural NetworksCode1
A Bandit Approach to Online Pricing for Heterogeneous Edge Resource Allocation0
Learning How to Infer Partial MDPs for In-Context Adaptation and Exploration0
Leveraging Demonstrations to Improve Online Learning: Quality Matters0
Optimality of Thompson Sampling with Noninformative Priors for Pareto Bandits0
Two-sided Competing Matching Recommendation Markets With Quota and Complementary Preferences ConstraintsCode0
Differentially Private Online Bayesian Estimation With Adaptive TruncationCode0
A Combinatorial Semi-Bandit Approach to Charging Station Selection for Electric Vehicles0
Thompson Sampling with Diffusion Generative Prior0
Reinforcement Learning in Credit Scoring and Underwriting0
Neural Bandits for Data Mining: Searching for Dangerous PolypharmacyCode0
Online Learning-based Waveform Selection for Improved Vehicle Recognition in Automotive Radar0
Monte Carlo Tree Search Algorithms for Risk-Aware and Multi-Objective Reinforcement Learning0
Thompson Sampling for High-Dimensional Sparse Linear Contextual BanditsCode0
Atlas: Automate Online Service Configuration in Network SlicingCode0
Optimizing Pessimism in Dynamic Treatment Regimes: A Bayesian Learning ApproachCode0
Meta Learning of Interface Conditions for Multi-Domain Physics-Informed Neural Networks0
Sample-Then-Optimize Batch Neural Thompson SamplingCode1
Deep Active Ensemble Sampling For Image Classification0
The Typical Behavior of Bandit Algorithms0
Cost Aware Asynchronous Multi-Agent Active Search0
Thompson Sampling with Virtual Helping Agents0
Double Doubly Robust Thompson Sampling for Generalized Linear Contextual Bandits0
A Nonparametric Contextual Bandit with Arm-level Eligibility Control for Customer Service Routing0
Sample Efficient Learning of Factored Embeddings of Tensor Fields0
Causal Bandits for Linear Structural Equation ModelsCode0
Dynamic collaborative filtering Thompson Sampling for cross-domain advertisements recommendation0
A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning0
Non-Stationary Dynamic Pricing Via Actor-Critic Information-Directed Pricing0
Increasing Students' Engagement to Reminder Emails Through Multi-Armed Bandits0
Using Adaptive Experiments to Rapidly Help Students0
Bayesian Optimization-Based Beam Alignment for MmWave MIMO Communication Systems0
SPRT-based Efficient Best Arm Identification in Stochastic Bandits0
Chimera: A Hybrid Machine Learning Driven Multi-Objective Design Space Exploration Tool for FPGA High-Level Synthesis0
Ranking In Generalized Linear BanditsCode0
Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs0
Langevin Monte Carlo for Contextual BanditsCode1
Analysis of Thompson Sampling for Controlling Unknown Linear Diffusion Processes0
Thompson Sampling for (Combinatorial) Pure Exploration0
Thompson Sampling for Robust Transfer in Multi-Task BanditsCode0
Thompson Sampling Achieves O(T) Regret in Linear Quadratic Control0
A Contextual Combinatorial Semi-Bandit Approach to Network Bottleneck Identification0
On Provably Robust Meta-Bayesian OptimizationCode0
Top Two Algorithms Revisited0
Regret Bounds for Information-Directed Reinforcement Learning0
A Simple and Optimal Policy Design with Safety against Heavy-Tailed Risk for Stochastic Bandits0
Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits0
Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization0
Incentivizing Combinatorial Bandit Exploration0
Show:102550
← PrevPage 5 of 14Next →

No leaderboard results yet.