ANETAC: Arabic Named Entity Transliteration and Classification Dataset
2019-07-06Unverified0· sign in to hype
Mohamed Seghir Hadj Ameur, Farid Meziane, Ahmed Guessoum
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
In this paper, we make freely accessible ANETAC our English-Arabic named entity transliteration and classification dataset that we built from freely available parallel translation corpora. The dataset contains 79,924 instances, each instance is a triplet (e, a, c), where e is the English named entity, a is its Arabic transliteration and c is its class that can be either a Person, a Location, or an Organization. The ANETAC dataset is mainly aimed for the researchers that are working on Arabic named entity transliteration, but it can also be used for named entity classification purposes.