Regularized Training of Nearest Neighbor Language Models

2021-09-16NAACL (ACL) 2022Unverified0· sign in to hype

Jean-Francois Ton, Walter Talbott, Shuangfei Zhai, Josh Susskind

Unverified — Be the first to reproduce this paper.

Abstract

Including memory banks in a natural language processing architecture increases model capacity by equipping it with additional data at inference time. In this paper, we build upon kNN-LM khandelwal20generalization, which uses a pre-trained language model together with an exhaustive kNN search through the training data (memory bank) to achieve state-of-the-art results. We investigate whether we can improve the kNN-LM performance by instead training a LM with the knowledge that we will be using a kNN post-hoc. We achieved significant improvement using our method on language modeling tasks on WIKI-2 and WIKI-103. The main phenomenon that we encounter is that adding a simple L2 regularization on the activations (not weights) of the model, a transformer, improves the post-hoc kNN classification performance. We explore some possible reasons for this improvement. In particular, we find that the added L2 regularization seems to improve the performance for high-frequency words without deteriorating the performance for low frequency ones.

Tasks

L2 Regularization Language Modeling Language Modelling

Regularized Training of Nearest Neighbor Language Models

Abstract

Tasks

Reproductions