SOTAVerified

TDDC: Timely Disclosure Documents Corpus

2020-05-01LREC 2020Unverified0· sign in to hype

Nobushige Doi, Yusuke Oda, Toshiaki Nakazawa

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

In this paper, we describe the details of the Timely Disclosure Documents Corpus (TDDC). TDDC was prepared by manually aligning the sentences from past Japanese and English timely disclosure documents in PDF format published by companies listed on the Tokyo Stock Exchange. TDDC consists of approximately 1.4 million parallel sentences in Japanese and English. TDDC was used as the official dataset for the 6th Workshop on Asian Translation to encourage the development of machine translation.

Tasks

Reproductions