Transliteration

Transliteration is a mechanism for converting a word in a source (foreign) language to a target language, and often adopts approaches from machine translation. In machine translation, the objective is to preserve the semantic meaning of the utterance as much as possible while following the syntactic structure in the target language. In Transliteration, the objective is to preserve the original pronunciation of the source word as much as possible while following the phonological structures of the target language.

For example, the city’s name “Manchester” has become well known by people of languages other than English. These new words are often named entities that are important in cross-lingual information retrieval, information extraction, machine translation, and often present out-of-vocabulary challenges to spoken language technologies such as automatic speech recognition, spoken keyword search, and text-to-speech.

Source: Phonology-Augmented Statistical Framework for Machine Transliteration using Limited Linguistic Resources

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 435 papers

Title	Date	Tasks	Status	Hype
GR-NLP-TOOLKIT: An Open-Source NLP Toolkit for Modern Greek	Dec 11, 2024	Dependency ParsingMorphological Tagging	CodeCode Available	2
Aksharantar: Open Indic-language Transliteration datasets and models for the Next Billion Users	May 6, 2022	Transliteration	CodeCode Available	2
ParsiPy: NLP Toolkit for Historical Persian Texts in Python	Mar 22, 2025	LemmatizationPart-Of-Speech Tagging	CodeCode Available	1
ParaNames 1.0: Creating an Entity Name Corpus for 400+ Languages using Wikidata	May 15, 2024	Multilingual Named Entity Recognitionnamed-entity-recognition	CodeCode Available	1
Show Me the World in My Language: Establishing the First Baseline for Scene-Text to Scene-Text Translation	Aug 6, 2023	Machine TranslationScene Text Editing	CodeCode Available	1
Taqyim: Evaluating Arabic NLP Tasks Using ChatGPT Models	Jun 28, 2023	Part-Of-Speech TaggingSentiment Analysis	CodeCode Available	1
DeepScribe: Localization and Classification of Elamite Cuneiform Signs Via Deep Learning	Jun 2, 2023	Transliteration	CodeCode Available	1
Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration	May 25, 2023	Speech Synthesistext-to-speech	CodeCode Available	1
XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages	May 19, 2023	In-Context LearningMultilingual NLP	CodeCode Available	1
Beyond Arabic: Software for Perso-Arabic Script Manipulation	Jan 26, 2023	Transliteration	CodeCode Available	1
Question Answering Classification for Amharic Social Media Community Based Questions	Jun 1, 2022	8kQuestion Answering	CodeCode Available	1
A machine transliteration tool between Uzbek alphabets	May 19, 2022	Transliteration	CodeCode Available	1
ParaNames: A Massively Multilingual Entity Name Corpus	Feb 28, 2022	named-entity-recognitionNamed Entity Recognition	CodeCode Available	1
Sub-Character Tokenization for Chinese Pretrained Language Models	Jun 1, 2021	Chinese Word SegmentationComputational Efficiency	CodeCode Available	1
KLPT – Kurdish Language Processing Toolkit	Nov 1, 2020	DiversityLemmatization	CodeCode Available	1
Leveraging Multilingual News Websites for Building a Kurdish Parallel Corpus	Oct 4, 2020	ArticlesMachine Translation	CodeCode Available	1
Processing South Asian Languages Written in the Latin Script: the Dakshina Dataset	Jul 2, 2020	Language ModelingLanguage Modelling	CodeCode Available	1
Applying the Transformer to Character-level Transduction	May 20, 2020	Grapheme-to-Phoneme ConversionMorphological Inflection	CodeCode Available	1
An Ensemble Model of Word-based and Character-based Models for Japanese and Chinese Input Method	Dec 1, 2012	Transliteration	CodeCode Available	1
Optimizing Multilingual Text-To-Speech with Accents & Emotions	Jun 19, 2025	DisentanglementEmotion Recognition	—Unverified	0
Rethinking Hate Speech Detection on Social Media: Can LLMs Replace Traditional Models?	Jun 15, 2025	Hate Speech DetectionTransliteration	—Unverified	0
Beyond Specialization: Benchmarking LLMs for Transliteration of Indian Languages	May 26, 2025	BenchmarkingTransliteration	—Unverified	0
Lost in Transliteration: Bridging the Script Gap in Neural IR	May 13, 2025	Information RetrievalRetrieval	—Unverified	0
Bridging the Gap: An Intermediate Language for Enhanced and Cost-Effective Grapheme-to-Phoneme Conversion with Homographs with Multiple Pronunciations Disambiguation	May 10, 2025	Grapheme-to-Phoneme ConversionLarge Language Model	—Unverified	0
Proper Name Diacritization for Arabic Wikipedia: A Benchmark Dataset	May 5, 2025	Transliteration	—Unverified	0
Low-Resource Transliteration for Roman-Urdu and Urdu Using Transformer-Based Models	Mar 27, 2025	Information RetrievalLanguage Modeling	—Unverified	0
Connecting the Persian-speaking World through Transliteration	Feb 27, 2025	Machine TranslationTransliteration	—Unverified	0
NusaAksara: A Multimodal and Multilingual Benchmark for Preserving Indonesian Indigenous Scripts	Feb 25, 2025	Image SegmentationLanguage Identification	—Unverified	0
Linguistic Analysis of Sinhala YouTube Comments on Sinhala Music Videos: A Dataset Study	Jan 28, 2025	Emotion RecognitionInformation Retrieval	—Unverified	0
IndoNLP 2025: Shared Task on Real-Time Reverse Transliteration for Romanized Indo-Aryan languages	Jan 10, 2025	Transliteration	—Unverified	0
When LLMs Struggle: Reference-less Translation Evaluation for Low-resource Languages	Jan 8, 2025	Machine TranslationTransliteration	—Unverified	0
Sinhala Transliteration: A Comparative Analysis Between Rule-based and Seq2Seq Approaches	Dec 31, 2024	DecoderMachine Translation	CodeCode Available	0
LAMA-UT: Language Agnostic Multilingual ASR through Orthography Unification and Language-Specific Transliteration	Dec 19, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Transliterated Zero-Shot Domain Adaptation for Automatic Speech Recognition	Dec 15, 2024	Automatic Speech RecognitionDomain Adaptation	—Unverified	0
Romanized to Native Malayalam Script Transliteration Using an Encoder-Decoder Framework	Dec 13, 2024	DecoderTransliteration	CodeCode Available	0
PolyIPA -- Multilingual Phoneme-to-Grapheme Conversion Model	Dec 12, 2024	Data AugmentationInformation Retrieval	—Unverified	0
AyutthayaAlpha: A Thai-Latin Script Transliteration Transformer	Dec 5, 2024	Cross-Lingual Information RetrievalInformation Retrieval	—Unverified	0
Towards Cross-Cultural Machine Translation with Retrieval-Augmented Generation from Multilingual Knowledge Graphs	Oct 17, 2024	Knowledge GraphsMachine Translation	—Unverified	0
ChakmaNMT: A Low-resource Machine Translation On Chakma Language	Oct 14, 2024	BenchmarkingMachine Translation	—Unverified	0
UniGlyph: A Seven-Segment Script for Universal Language Representation	Oct 11, 2024	Diversityspeech-recognition	—Unverified	0
A two-stage transliteration approach to improve performance of a multilingual ASR	Oct 9, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
How Transliterations Improve Crosslingual Alignment	Sep 25, 2024	SentenceTransliteration	CodeCode Available	0
Exploring the Role of Transliteration in In-Context Learning for Low-resource Languages Written in Non-Latin Scripts	Jul 2, 2024	DecoderIn-Context Learning	—Unverified	0
Can Small Language Models Learn, Unlearn, and Retain Noise Patterns?	Jul 1, 2024	In-Context LearningTransliteration	CodeCode Available	0
Breaking the Script Barrier in Multilingual Pre-Trained Language Models with Transliteration-Based Post-Training Alignment	Jun 28, 2024	Cross-Lingual TransferTransliteration	CodeCode Available	0
Jailbreaking LLMs with Arabic Transliteration and Arabizi	Jun 26, 2024	Transliteration	CodeCode Available	0
Review of Computational Epigraphy	Jun 3, 2024	AttributeTransliteration	—Unverified	0
TransMI: A Framework to Create Strong Baselines from Multilingual Pretrained Language Models for Transliterated Data	May 16, 2024	Transliteration	CodeCode Available	0
Swa Bhasha: Message-Based Singlish to Sinhala Transliteration	Apr 20, 2024	Transliteration	—Unverified	0
Charles Translator: A Machine Translation System between Ukrainian and Czech	Apr 10, 2024	Machine TranslationTranslation	—Unverified	0

Show:10 25 50

← PrevPage 1 of 9Next →

No leaderboard results yet.