SOTAVerified

All That Glitters is Not Gold: A Gold Standard of Adjective-Noun Collocations for German

2020-05-01LREC 2020Unverified0· sign in to hype

Yana Strakatova, Neele Falk, Isabel Fuhrmann, Erhard Hinrichs, Daniela Rossmann

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

In this paper we present the GerCo dataset of adjective-noun collocations for German, such as alter Freund `old friend' and tiefe Liebe `deep love'. The annotation has been performed by experts based on the annotation scheme introduced in this paper. The resulting dataset contains 4,732 positive and negative instances of collocations and covers all the 16 semantic classes of adjectives as defined in the German wordnet GermaNet. The dataset can serve as a reliable empirical basis for comparing different theoretical frameworks concerned with collocations or as material for data-driven approaches to the studies of collocations including different machine learning experiments. This paper addresses the latter issue by using the GerCo dataset for evaluating different models on the task of automatic collocation identification. We compare lexical association measures with static and contextualized word embeddings. The experiments show that word embeddings outperform methods based on statistical association measures by a wide margin.

Tasks

Reproductions