SOTAVerified

Active Learning for Entity Filtering in Microblog Streams

2015-08-01Code Available0· sign in to hype

Damiano Spina, Maria-Hendrike Peetz, Maarten de Rijke

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Monitoring the reputation of entities such as companies or brands in microblog streams (e.g., Twitter) starts by selecting mentions that are related to the entity of interest. Entities are often ambiguous (e.g., "Jaguar'' or "Ford'') and effective methods for selectively removing non-relevant mentions often use background knowledge obtained from domain experts. Manual annotations by experts, however, are costly. We therefore approach the problem of entity filtering with active learning, thereby reducing the annotation load for experts. To this end, we use a strong passive baseline and analyze different sampling methods for selecting samples for annotation. We find that margin sampling--an informative type of sampling that considers the distance to the hyperplane used for class separation--can effectively be used for entity filtering and can significantly reduce the cost of annotating initial training data.

Tasks

Reproductions