Complexity of Vector-valued Prediction: From Linear Models to Stochastic Convex Optimization
Matan Schliserman, Tomer Koren
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
We study the problem of learning vector-valued linear predictors: these are prediction rules parameterized by a matrix that maps an m-dimensional feature vector to a k-dimensional target. We focus on the fundamental case with a convex and Lipschitz loss function, and show several new theoretical results that shed light on the complexity of this problem and its connection to related learning models. First, we give a tight characterization of the sample complexity of Empirical Risk Minimization (ERM) in this setting, establishing that (k/^2) examples are necessary for ERM to reach excess (population) risk; this provides for an exponential improvement over recent results by Magen and Shamir (2023) in terms of the dependence on the target dimension k, and matches a classical upper bound due to Maurer (2016). Second, we present a black-box conversion from general d-dimensional Stochastic Convex Optimization (SCO) to vector-valued linear prediction, showing that any SCO problem can be embedded as a prediction problem with k=(d) outputs. These results portray the setting of vector-valued linear prediction as bridging between two extensively studied yet disparate learning models: linear models (corresponds to k=1) and general d-dimensional SCO (with k=(d)).