SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 631640 of 655 papers

TitleStatusHype
Bayesian Optimization with LLM-Based Acquisition Functions for Natural Language Preference Elicitation0
Bayesian Quantile and Expectile Optimisation0
BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems0
BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems0
Belief Flows of Robust Online Learning0
Best Arm Identification in Batched Multi-armed Bandit Problems0
Active RLHF via Best Policy Learning from Trajectory Preference Feedback0
Better Optimism By Bayes: Adaptive Planning with Rich Models0
Blind Exploration and Exploitation of Stochastic Experts0
Bootstrapped Thompson Sampling and Deep Exploration0
Show:102550
← PrevPage 64 of 66Next →

No leaderboard results yet.