DEMix Layers: Disentangling Domains for Modular Language Modeling

2021-10-16ACL ARR October 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Abstract

We introduce a new domain expert mixture (DEMix) layer that enables conditioning a language model (LM) on the domain of the input text. A DEMix layer is a collection of expert feedforward networks, each specialized to a domain, that makes the LM modular: experts can be mixed, added, or removed after initial training. Extensive experiments with autoregressive transformer LMs (up to 1.3B parameters) show that DEMix layers reduce perplexity, increase training efficiency, and enable rapid adaptation. Mixing experts during inference, using a parameter-free weighted ensemble, enables better generalization to heterogeneous or unseen domains. Adding experts incorporates new domains without forgetting older ones, and removing experts restricts access to unwanted domains without additional training. Overall, these results demonstrate benefits of explicitly conditioning on textual domains during language modeling.

Tasks

Language Modeling Language Modelling

DEMix Layers: Disentangling Domains for Modular Language Modeling

Abstract

Tasks

Reproductions