FeelsGoodMan: Inferring Semantics of Twitch Neologisms

2021-11-16ACL ARR November 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Abstract

Twitch chat messages pose a unique problem in natural language understanding due to a large presence of neologisms, specifically emotes. There are a total of 8.06 million emotes, over 400k of which were observed during the study period. There is virtually no information on the meaning or sentiment of emotes, and with a constant influx of new emotes and drift in both their frequencies and their perceived meanings, it becomes impossible to maintain an updated manually-labeled dataset. Our paper makes a two-fold contribution. First, we establish a new baseline for sentiment analysis on Twitch data, outperforming the previous benchmark by 7.36 percentage points. Secondly, we introduce a simple but powerful unsupervised framework based on word embeddings and k-NN to enrich existing models with out-of-vocabulary knowledge. This framework allows us to auto-generate an emote pseudo-dictionary, and we show that we can nearly match the supervised benchmark above, even when injecting such emote knowledge into sentiment classifiers trained on extraneous datasets such as movie reviews or Twitter.

Tasks

Natural Language Understanding Sentiment Analysis Word Embeddings

FeelsGoodMan: Inferring Semantics of Twitch Neologisms

Abstract

Tasks

Reproductions