Temporal Language Modeling for Short Text Document Classification with Transformers

2021-11-16ACL ARR November 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Abstract

Language models are typically trained on solely text data, not utilizing documents timestamp, which is available in most internet corpora. In this paper, we examine the impact of incorporating timestamp into transformer language model in terms of downstream classification task and masked language modeling on 2 short texts corpora. We examine different timestamp components: day of the month, month, year, weekday. We test different methods of incorporating date into the model: prefixing date components into text input and adding trained date embeddings. Our study shows, that such a temporal language model performs better than a regular language model for both documents from training data time span and unseen time span. That holds true for classification and language modeling. Prefixing date components into text performs not worse than training special date components embeddings.

Tasks

Classification Document Classification Language Modeling Language Modelling Masked Language Modeling

Temporal Language Modeling for Short Text Document Classification with Transformers

Abstract

Tasks

Reproductions