SOTAVerified

The Foes of Neural Network’s Data Efficiency Among Unnecessary Input Dimensions

2021-01-01Unverified0· sign in to hype

Vanessa D'Amario, Sanjana Srivastava, Tomotake Sasaki, Xavier Boix

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Input dimensions are unnecessary for a given task when the target function can be expressed without such dimensions. Object's background in image recognition or redundant sentences in text classification are examples of unnecessary dimensions that are often present in datasets. Deep neural networks achieve remarkable generalization performance despite the presence of unnecessary dimensions but it is unclear whether these dimensions negatively affect neural networks or how. In this paper, we investigate the impact of unnecessary input dimensions on one of the central issues of machine learning: the number of training examples needed to achieve high generalization performance, which we refer to as the network's data efficiency. In a series of analyses with multi-layer perceptrons and deep convolutional neural networks, we show that the network's data efficiency depends on whether the unnecessary dimensions are task-unrelated or task-related (unnecessary due to redundancy). Namely, we demonstrate that increasing the number of task-unrelated dimensions leads to an incorrect inductive bias and as a result degrade the data efficiency, while increasing the number of task-related dimensions helps to alleviate the negative impact of the task-unrelated dimensions. These results highlight the need for mechanisms that remove task-unrelated dimensions, such as crops or foveation for image classification, to enable data efficiency gains.

Tasks

Reproductions