ERM and RERM are optimal estimators for regression problems when malicious outliers corrupt the labels
Geoffrey Chinot
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
We study Empirical Risk Minimizers (ERM) and Regularized Empirical Risk Minimizers (RERM) for regression problems with convex and L-Lipschitz loss functions. We consider a setting where || malicious outliers contaminate the labels. In that case, under a local Bernstein condition, we show that the L_2-error rate is bounded by r_N + AL ||/N, where N is the total number of observations, r_N is the L_2-error rate in the non-contaminated setting and A is a parameter coming from the local Bernstein condition. When r_N is minimax-rate-optimal in a non-contaminated setting, the rate r_N + AL||/N is also minimax-rate-optimal when || outliers contaminate the label. The main results of the paper can be used for many non-regularized and regularized procedures under weak assumptions on the noise. We present results for Huber's M-estimators (without penalization or regularized by the _1-norm) and for general regularized learning problems in reproducible kernel Hilbert spaces when the noise can be heavy-tailed.