zNLP: Identifying Parallel Sentences in Chinese-English Comparable Corpora
2017-08-01WS 2017Unverified0· sign in to hype
Zheng Zhang, Pierre Zweigenbaum
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
This paper describes the zNLP system for the BUCC 2017 shared task. Our system identifies parallel sentence pairs in Chinese-English comparable corpora by translating word-by-word Chinese sentences into English, using the search engine Solr to select near-parallel sentences and then by using an SVM classifier to identify true parallel sentences from the previous results. It obtains an F1-score of 45\% (resp. 32\%) on the test (training) set.