SOTAVerified

Comparing the Quality of Focused Crawlers and of the Translation Resources Obtained from them

2014-05-01LREC 2014Unverified0· sign in to hype

Bruno Laranjeira, Viviane Moreira, Aline Villavicencio, Carlos Ramisch, Maria Jos{\'e} Finatto

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Comparable corpora have been used as an alternative for parallel corpora as resources for computational tasks that involve domain-specific natural language processing. One way to gather documents related to a specific topic of interest is to traverse a portion of the web graph in a targeted way, using focused crawling algorithms. In this paper, we compare several focused crawling algorithms using them to collect comparable corpora on a specific domain. Then, we compare the evaluation of the focused crawling algorithms to the performance of linguistic processes executed after training with the corresponding generated corpora. Also, we propose a novel approach for focused crawling, exploiting the expressive power of multiword expressions.

Tasks

Reproductions