SOTAVerified

TweetNorm\_es: an annotated corpus for Spanish microtext normalization

2014-05-01LREC 2014Unverified0· sign in to hype

I{\~n}aki Alegria, Nora Aranberri, Pere Comas, V{\'\i}ctor Fresno, Pablo Gamallo, Lluis Padr{\'o}, I{\~n}aki San Vicente, Jordi Turmo, Arkaitz Zubiaga

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

In this paper we introduce TweetNorm\_es, an annotated corpus of tweets in Spanish language, which we make publicly available under the terms of the CC-BY license. This corpus is intended for development and testing of microtext normalization systems. It was created for Tweet-Norm, a tweet normalization workshop and shared task, and is the result of a joint annotation effort from different research groups. In this paper we describe the methodology defined to build the corpus as well as the guidelines followed in the annotation process. We also present a brief overview of the Tweet-Norm shared task, as the first evaluation environment where the corpus was used.

Tasks

Reproductions