SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 961970 of 1262 papers

TitleStatusHype
Thompson Sampling Algorithms for Cascading Bandits0
Thompson Sampling for Contextual Bandit Problems with Auxiliary Safety Constraints0
Thompson sampling for improved exploration in GFlowNets0
Thompson Sampling for Unsupervised Sequential Selection0
Thompson sampling for zero-inflated count outcomes with an application to the Drink Less mobile health study0
Thompson Sampling in Partially Observable Contextual Bandits0
Thompson Sampling Regret Bounds for Contextual Bandits with sub-Gaussian rewards0
Thresholding Data Shapley for Data Cleansing Using Multi-Armed Bandits0
Tight Gap-Dependent Memory-Regret Trade-Off for Single-Pass Streaming Stochastic Multi-Armed Bandits0
Tight Lower Bounds for Combinatorial Multi-Armed Bandits0
Show:102550
← PrevPage 97 of 127Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified