Transliteration

Transliteration is a mechanism for converting a word in a source (foreign) language to a target language, and often adopts approaches from machine translation. In machine translation, the objective is to preserve the semantic meaning of the utterance as much as possible while following the syntactic structure in the target language. In Transliteration, the objective is to preserve the original pronunciation of the source word as much as possible while following the phonological structures of the target language.

For example, the city’s name “Manchester” has become well known by people of languages other than English. These new words are often named entities that are important in cross-lingual information retrieval, information extraction, machine translation, and often present out-of-vocabulary challenges to spoken language technologies such as automatic speech recognition, spoken keyword search, and text-to-speech.

Source: Phonology-Augmented Statistical Framework for Machine Transliteration using Limited Linguistic Resources

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–25 of 435 papers

Title	Date	Tasks	Status	Hype	Score
Aksharantar: Open Indic-language Transliteration datasets and models for the Next Billion Users	May 6, 2022	Transliteration	CodeCode Available	2	5
GR-NLP-TOOLKIT: An Open-Source NLP Toolkit for Modern Greek	Dec 11, 2024	Dependency ParsingMorphological Tagging	CodeCode Available	2	5
ParaNames: A Massively Multilingual Entity Name Corpus	Feb 28, 2022	named-entity-recognitionNamed Entity Recognition	CodeCode Available	1	5
Sub-Character Tokenization for Chinese Pretrained Language Models	Jun 1, 2021	Chinese Word SegmentationComputational Efficiency	CodeCode Available	1	5
Show Me the World in My Language: Establishing the First Baseline for Scene-Text to Scene-Text Translation	Aug 6, 2023	Machine TranslationScene Text Editing	CodeCode Available	1	5
Processing South Asian Languages Written in the Latin Script: the Dakshina Dataset	Jul 2, 2020	Language ModelingLanguage Modelling	CodeCode Available	1	5
Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration	May 25, 2023	Speech Synthesistext-to-speech	CodeCode Available	1	5
Applying the Transformer to Character-level Transduction	May 20, 2020	Grapheme-to-Phoneme ConversionMorphological Inflection	CodeCode Available	1	5
ParsiPy: NLP Toolkit for Historical Persian Texts in Python	Mar 22, 2025	LemmatizationPart-Of-Speech Tagging	CodeCode Available	1	5
Question Answering Classification for Amharic Social Media Community Based Questions	Jun 1, 2022	8kQuestion Answering	CodeCode Available	1	5
DeepScribe: Localization and Classification of Elamite Cuneiform Signs Via Deep Learning	Jun 2, 2023	Transliteration	CodeCode Available	1	5
XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages	May 19, 2023	In-Context LearningMultilingual NLP	CodeCode Available	1	5
KLPT – Kurdish Language Processing Toolkit	Nov 1, 2020	DiversityLemmatization	CodeCode Available	1	5
A machine transliteration tool between Uzbek alphabets	May 19, 2022	Transliteration	CodeCode Available	1	5
ParaNames 1.0: Creating an Entity Name Corpus for 400+ Languages using Wikidata	May 15, 2024	Multilingual Named Entity Recognitionnamed-entity-recognition	CodeCode Available	1	5
An Ensemble Model of Word-based and Character-based Models for Japanese and Chinese Input Method	Dec 1, 2012	Transliteration	CodeCode Available	1	5
Leveraging Multilingual News Websites for Building a Kurdish Parallel Corpus	Oct 4, 2020	ArticlesMachine Translation	CodeCode Available	1	5
Beyond Arabic: Software for Perso-Arabic Script Manipulation	Jan 26, 2023	Transliteration	CodeCode Available	1	5
Taqyim: Evaluating Arabic NLP Tasks Using ChatGPT Models	Jun 28, 2023	Part-Of-Speech TaggingSentiment Analysis	CodeCode Available	1	5
Event detection in Twitter: A keyword volume approach	Jan 3, 2019	Binary ClassificationEvent Detection	CodeCode Available	0	5
Efficient Sequence Labeling with Actor-Critic Training	Sep 30, 2018	Decision MakingNER	CodeCode Available	0	5
Exploiting Language Relatedness for Low Web-Resource Language Model Adaptation: An Indic Languages Study	Jun 7, 2021	Data AugmentationLanguage Modeling	CodeCode Available	0	5
Design Challenges in Named Entity Transliteration	Aug 7, 2018	DecoderTransliteration	CodeCode Available	0	5
A Multi-cascaded Deep Model for Bilingual SMS Classification	Nov 29, 2019	ClassificationGeneral Classification	CodeCode Available	0	5
Cross-Lingual Text Classification of Transliterated Hindi and Malayalam	Aug 31, 2021	BenchmarkingClassification	CodeCode Available	0	5

Show:10 25 50

← PrevPage 1 of 18Next →

No leaderboard results yet.