SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 12311240 of 1262 papers

TitleStatusHype
Constrained Policy Optimization for Controlled Self-Learning in Conversational AI Systems0
Constrained Pure Exploration Multi-Armed Bandits with a Fixed Budget0
Context-Aware Bandits0
Contexts can be Cheap: Solving Stochastic Contextual Bandits with Linear Bandit Algorithms0
Contextual Bandit Applications in Customer Support Bot0
Contextual Bandits and Imitation Learning via Preference-Based Active Queries0
Contextual Bandits and Optimistically Universal Learning0
Contextual Bandits Evolving Over Finite Time0
Contextual Bandits for adapting to changing User preferences over time0
Contextual Bandits for Advertising Budget Allocation0
Show:102550
← PrevPage 124 of 127Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified