Bridge-Language Capitalization Inference in Western Iranian: Sorani, Kurmanji, Zazaki, and Tajik
2016-05-01LREC 2016Unverified0· sign in to hype
Patrick Littell, David R. Mortensen, Kartik Goyal, Chris Dyer, Lori Levin
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
In Sorani Kurdish, one of the most useful orthographic features in named-entity recognition -- capitalization -- is absent, as the language's Perso-Arabic script does not make a distinction between uppercase and lowercase letters. We describe a system for deriving an inferred capitalization value from closely related languages by phonological similarity, and illustrate the system using several related Western Iranian languages.