Geographical Erasure in Language Generation

2023-10-23Code Available0· sign in to hype

Pola Schwöbel, Jacek Golebiowski, Michele Donini, Cédric Archambeau, Danish Pruthi

Code Available — Be the first to reproduce this paper.

Code

github.com/amazon-science/geographical-erasure-in-language-generation
OfficialIn paperpytorch★ 7

Abstract

Large language models (LLMs) encode vast amounts of world knowledge. However, since these models are trained on large swaths of internet data, they are at risk of inordinately capturing information about dominant groups. This imbalance can propagate into generated language. In this work, we study and operationalise a form of geographical erasure, wherein language models underpredict certain countries. We demonstrate consistent instances of erasure across a range of LLMs. We discover that erasure strongly correlates with low frequencies of country mentions in the training corpus. Lastly, we mitigate erasure by finetuning using a custom objective.

Tasks

Text Generation World Knowledge

Geographical Erasure in Language Generation

Code

Abstract

Tasks

Reproductions