SOTAVerified

ASPEC: Asian Scientific Paper Excerpt Corpus

2016-05-01LREC 2016Unverified0· sign in to hype

Toshiaki Nakazawa, Manabu Yaguchi, Kiyotaka Uchimoto, Masao Utiyama, Eiichiro Sumita, Sadao Kurohashi, Hitoshi Isahara

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

In this paper, we describe the details of the ASPEC (Asian Scientific Paper Excerpt Corpus), which is the first large-size parallel corpus of scientific paper domain. ASPEC was constructed in the Japanese-Chinese machine translation project conducted between 2006 and 2010 using the Special Coordination Funds for Promoting Science and Technology. It consists of a Japanese-English scientific paper abstract corpus of approximately 3 million parallel sentences (ASPEC-JE) and a Chinese-Japanese scientific paper excerpt corpus of approximately 0.68 million parallel sentences (ASPEC-JC). ASPEC is used as the official dataset for the machine translation evaluation workshop WAT (Workshop on Asian Translation).

Tasks

Reproductions