SOTAVerified

Tajik-Farsi Persian Transliteration Using Statistical Machine Translation

2012-05-01LREC 2012Unverified0· sign in to hype

Chris Irwin Davis

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Tajik Persian is a dialect of Persian spoken primarily in Tajikistan and written with a modified Cyrillic alphabet. Iranian Persian, or Farsi, as it is natively called, is the lingua franca of Iran and is written with the Persian alphabet, a modified Arabic script. Although the spoken versions of Tajik and Farsi are mutually intelligible to educated speakers of both languages, the difference between the writing systems constitutes a barrier to text compatibility between the two languages. This paper presents a system to transliterate text between these two different Persian dialects that use incompatible writing systems. The system also serves as a mechanism to facilitate sharing of computational linguistic resources between the two languages. This is relevant because of the disparity in resources for Tajik versus Farsi.

Tasks

Reproductions