Transliteration

Transliteration is a mechanism for converting a word in a source (foreign) language to a target language, and often adopts approaches from machine translation. In machine translation, the objective is to preserve the semantic meaning of the utterance as much as possible while following the syntactic structure in the target language. In Transliteration, the objective is to preserve the original pronunciation of the source word as much as possible while following the phonological structures of the target language.

For example, the city’s name “Manchester” has become well known by people of languages other than English. These new words are often named entities that are important in cross-lingual information retrieval, information extraction, machine translation, and often present out-of-vocabulary challenges to spoken language technologies such as automatic speech recognition, spoken keyword search, and text-to-speech.

Source: Phonology-Augmented Statistical Framework for Machine Transliteration using Limited Linguistic Resources

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–25 of 435 papers

Title	Date	Tasks	Status	Hype
GR-NLP-TOOLKIT: An Open-Source NLP Toolkit for Modern Greek	Dec 11, 2024	Dependency ParsingMorphological Tagging	CodeCode Available	2
Aksharantar: Open Indic-language Transliteration datasets and models for the Next Billion Users	May 6, 2022	Transliteration	CodeCode Available	2
Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration	May 25, 2023	Speech Synthesistext-to-speech	CodeCode Available	1
Processing South Asian Languages Written in the Latin Script: the Dakshina Dataset	Jul 2, 2020	Language ModelingLanguage Modelling	CodeCode Available	1
Taqyim: Evaluating Arabic NLP Tasks Using ChatGPT Models	Jun 28, 2023	Part-Of-Speech TaggingSentiment Analysis	CodeCode Available	1
ParaNames: A Massively Multilingual Entity Name Corpus	Feb 28, 2022	named-entity-recognitionNamed Entity Recognition	CodeCode Available	1
KLPT – Kurdish Language Processing Toolkit	Nov 1, 2020	DiversityLemmatization	CodeCode Available	1
Applying the Transformer to Character-level Transduction	May 20, 2020	Grapheme-to-Phoneme ConversionMorphological Inflection	CodeCode Available	1
ParaNames 1.0: Creating an Entity Name Corpus for 400+ Languages using Wikidata	May 15, 2024	Multilingual Named Entity Recognitionnamed-entity-recognition	CodeCode Available	1
ParsiPy: NLP Toolkit for Historical Persian Texts in Python	Mar 22, 2025	LemmatizationPart-Of-Speech Tagging	CodeCode Available	1
Show Me the World in My Language: Establishing the First Baseline for Scene-Text to Scene-Text Translation	Aug 6, 2023	Machine TranslationScene Text Editing	CodeCode Available	1
XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages	May 19, 2023	In-Context LearningMultilingual NLP	CodeCode Available	1
Sub-Character Tokenization for Chinese Pretrained Language Models	Jun 1, 2021	Chinese Word SegmentationComputational Efficiency	CodeCode Available	1
A machine transliteration tool between Uzbek alphabets	May 19, 2022	Transliteration	CodeCode Available	1
An Ensemble Model of Word-based and Character-based Models for Japanese and Chinese Input Method	Dec 1, 2012	Transliteration	CodeCode Available	1
Beyond Arabic: Software for Perso-Arabic Script Manipulation	Jan 26, 2023	Transliteration	CodeCode Available	1
DeepScribe: Localization and Classification of Elamite Cuneiform Signs Via Deep Learning	Jun 2, 2023	Transliteration	CodeCode Available	1
Leveraging Multilingual News Websites for Building a Kurdish Parallel Corpus	Oct 4, 2020	ArticlesMachine Translation	CodeCode Available	1
Question Answering Classification for Amharic Social Media Community Based Questions	Jun 1, 2022	8kQuestion Answering	CodeCode Available	1
Agreement on Target-bidirectional Neural Machine Translation	Jun 1, 2016	Machine TranslationStructured Prediction	—Unverified	0
A Framework for the Classification and Annotation of Multiword Expressions in Dialectal Arabic	Oct 1, 2014	Entity Extraction using GANGeneral Classification	—Unverified	0
A Comparison of Entity Matching Methods between English and Japanese Katakana	Oct 1, 2018	Transliteration	—Unverified	0
A Digital Swedish-Yiddish/Yiddish-Swedish Dictionary: A Web-Based Dictionary that is also Available Offline	Jun 1, 2022	Transliteration	—Unverified	0
A Deep Learning Based Approach to Transliteration	Jul 1, 2018	Deep LearningInformation Retrieval	—Unverified	0
amLite: Amharic Transliteration Using Key Map Dictionary	Sep 16, 2015	Transliteration	—Unverified	0

Show:10 25 50

← PrevPage 1 of 18Next →

No leaderboard results yet.