Comparing the Quality of Focused Crawlers and of the Translation Resources Obtained from them

2014-05-01LREC 2014Unverified0· sign in to hype

Bruno Laranjeira, Viviane Moreira, Aline Villavicencio, Carlos Ramisch, Maria Jos{\'e} Finatto

Unverified — Be the first to reproduce this paper.

Abstract

Comparable corpora have been used as an alternative for parallel corpora as resources for computational tasks that involve domain-specific natural language processing. One way to gather documents related to a specific topic of interest is to traverse a portion of the web graph in a targeted way, using focused crawling algorithms. In this paper, we compare several focused crawling algorithms using them to collect comparable corpora on a specific domain. Then, we compare the evaluation of the focused crawling algorithms to the performance of linguistic processes executed after training with the corresponding generated corpora. Also, we propose a novel approach for focused crawling, exploiting the expressive power of multiword expressions.

Tasks

Machine Translation Translation

Comparing the Quality of Focused Crawlers and of the Translation Resources Obtained from them

Abstract

Tasks

Reproductions