Transliteration

Transliteration is a mechanism for converting a word in a source (foreign) language to a target language, and often adopts approaches from machine translation. In machine translation, the objective is to preserve the semantic meaning of the utterance as much as possible while following the syntactic structure in the target language. In Transliteration, the objective is to preserve the original pronunciation of the source word as much as possible while following the phonological structures of the target language.

For example, the city’s name “Manchester” has become well known by people of languages other than English. These new words are often named entities that are important in cross-lingual information retrieval, information extraction, machine translation, and often present out-of-vocabulary challenges to spoken language technologies such as automatic speech recognition, spoken keyword search, and text-to-speech.

Source: Phonology-Augmented Statistical Framework for Machine Transliteration using Limited Linguistic Resources

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–25 of 435 papers

Title	Date	Tasks	Status	Hype
GR-NLP-TOOLKIT: An Open-Source NLP Toolkit for Modern Greek	Dec 11, 2024	Dependency ParsingMorphological Tagging	CodeCode Available	2
Aksharantar: Open Indic-language Transliteration datasets and models for the Next Billion Users	May 6, 2022	Transliteration	CodeCode Available	2
ParsiPy: NLP Toolkit for Historical Persian Texts in Python	Mar 22, 2025	LemmatizationPart-Of-Speech Tagging	CodeCode Available	1
ParaNames 1.0: Creating an Entity Name Corpus for 400+ Languages using Wikidata	May 15, 2024	Multilingual Named Entity Recognitionnamed-entity-recognition	CodeCode Available	1
Show Me the World in My Language: Establishing the First Baseline for Scene-Text to Scene-Text Translation	Aug 6, 2023	Machine TranslationScene Text Editing	CodeCode Available	1
Taqyim: Evaluating Arabic NLP Tasks Using ChatGPT Models	Jun 28, 2023	Part-Of-Speech TaggingSentiment Analysis	CodeCode Available	1
DeepScribe: Localization and Classification of Elamite Cuneiform Signs Via Deep Learning	Jun 2, 2023	Transliteration	CodeCode Available	1
Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration	May 25, 2023	Speech Synthesistext-to-speech	CodeCode Available	1
XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages	May 19, 2023	In-Context LearningMultilingual NLP	CodeCode Available	1
Beyond Arabic: Software for Perso-Arabic Script Manipulation	Jan 26, 2023	Transliteration	CodeCode Available	1
Question Answering Classification for Amharic Social Media Community Based Questions	Jun 1, 2022	8kQuestion Answering	CodeCode Available	1
A machine transliteration tool between Uzbek alphabets	May 19, 2022	Transliteration	CodeCode Available	1
ParaNames: A Massively Multilingual Entity Name Corpus	Feb 28, 2022	named-entity-recognitionNamed Entity Recognition	CodeCode Available	1
Sub-Character Tokenization for Chinese Pretrained Language Models	Jun 1, 2021	Chinese Word SegmentationComputational Efficiency	CodeCode Available	1
KLPT – Kurdish Language Processing Toolkit	Nov 1, 2020	DiversityLemmatization	CodeCode Available	1
Leveraging Multilingual News Websites for Building a Kurdish Parallel Corpus	Oct 4, 2020	ArticlesMachine Translation	CodeCode Available	1
Processing South Asian Languages Written in the Latin Script: the Dakshina Dataset	Jul 2, 2020	Language ModelingLanguage Modelling	CodeCode Available	1
Applying the Transformer to Character-level Transduction	May 20, 2020	Grapheme-to-Phoneme ConversionMorphological Inflection	CodeCode Available	1
An Ensemble Model of Word-based and Character-based Models for Japanese and Chinese Input Method	Dec 1, 2012	Transliteration	CodeCode Available	1
Optimizing Multilingual Text-To-Speech with Accents & Emotions	Jun 19, 2025	DisentanglementEmotion Recognition	—Unverified	0
Rethinking Hate Speech Detection on Social Media: Can LLMs Replace Traditional Models?	Jun 15, 2025	Hate Speech DetectionTransliteration	—Unverified	0
Beyond Specialization: Benchmarking LLMs for Transliteration of Indian Languages	May 26, 2025	BenchmarkingTransliteration	—Unverified	0
Lost in Transliteration: Bridging the Script Gap in Neural IR	May 13, 2025	Information RetrievalRetrieval	—Unverified	0
Bridging the Gap: An Intermediate Language for Enhanced and Cost-Effective Grapheme-to-Phoneme Conversion with Homographs with Multiple Pronunciations Disambiguation	May 10, 2025	Grapheme-to-Phoneme ConversionLarge Language Model	—Unverified	0
Proper Name Diacritization for Arabic Wikipedia: A Benchmark Dataset	May 5, 2025	Transliteration	—Unverified	0

Show:10 25 50

← PrevPage 1 of 18Next →

No leaderboard results yet.