SOTAVerified

Linear Regression using Heterogeneous Data Batches

2023-09-05Unverified0· sign in to hype

Ayush Jain, Rajat Sen, Weihao Kong, Abhimanyu Das, Alon Orlitsky

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

In many learning applications, data are collected from multiple sources, each providing a batch of samples that by itself is insufficient to learn its input-output relationship. A common approach assumes that the sources fall in one of several unknown subgroups, each with an unknown input distribution and input-output relationship. We consider one of this setup's most fundamental and important manifestations where the output is a noisy linear combination of the inputs, and there are k subgroups, each with its own regression vector. Prior work~kong2020meta showed that with abundant small-batches, the regression vectors can be learned with only few, ( k^3/2), batches of medium-size with ( k) samples each. However, the paper requires that the input distribution for all k subgroups be isotropic Gaussian, and states that removing this assumption is an ``interesting and challenging problem". We propose a novel gradient-based algorithm that improves on the existing results in several ways. It extends the applicability of the algorithm by: (1) allowing the subgroups' underlying input distributions to be different, unknown, and heavy-tailed; (2) recovering all subgroups followed by a significant proportion of batches even for infinite k; (3) removing the separation requirement between the regression vectors; (4) reducing the number of batches and allowing smaller batch sizes.

Tasks

Reproductions