Optimal Order Simple Regret for Gaussian Process Bandits
Sattar Vakili, Nacime Bouziani, Sepehr Jalali, Alberto Bernacchia, Da-Shan Shiu
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Consider the sequential optimization of a continuous, possibly non-convex, and expensive to evaluate objective function f. The problem can be cast as a Gaussian Process (GP) bandit where f lives in a reproducing kernel Hilbert space (RKHS). The state of the art analysis of several learning algorithms shows a significant gap between the lower and upper bounds on the simple regret performance. When N is the number of exploration trials and _N is the maximal information gain, we prove an O(_N/N) bound on the simple regret performance of a pure exploration algorithm that is significantly tighter than the existing bounds. We show that this bound is order optimal up to logarithmic factors for the cases where a lower bound on regret is known. To establish these results, we prove novel and sharp confidence intervals for GP models applicable to RKHS elements which may be of broader interest.