SOTAVerified

Transliteration

Transliteration is a mechanism for converting a word in a source (foreign) language to a target language, and often adopts approaches from machine translation. In machine translation, the objective is to preserve the semantic meaning of the utterance as much as possible while following the syntactic structure in the target language. In Transliteration, the objective is to preserve the original pronunciation of the source word as much as possible while following the phonological structures of the target language.

For example, the city’s name “Manchester” has become well known by people of languages other than English. These new words are often named entities that are important in cross-lingual information retrieval, information extraction, machine translation, and often present out-of-vocabulary challenges to spoken language technologies such as automatic speech recognition, spoken keyword search, and text-to-speech.

Source: Phonology-Augmented Statistical Framework for Machine Transliteration using Limited Linguistic Resources

Papers

Showing 150 of 435 papers

TitleStatusHype
GR-NLP-TOOLKIT: An Open-Source NLP Toolkit for Modern GreekCode2
Aksharantar: Open Indic-language Transliteration datasets and models for the Next Billion UsersCode2
ParsiPy: NLP Toolkit for Historical Persian Texts in PythonCode1
ParaNames 1.0: Creating an Entity Name Corpus for 400+ Languages using WikidataCode1
Show Me the World in My Language: Establishing the First Baseline for Scene-Text to Scene-Text TranslationCode1
Taqyim: Evaluating Arabic NLP Tasks Using ChatGPT ModelsCode1
DeepScribe: Localization and Classification of Elamite Cuneiform Signs Via Deep LearningCode1
Multilingual Text-to-Speech Synthesis for Turkic Languages Using TransliterationCode1
XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented LanguagesCode1
Beyond Arabic: Software for Perso-Arabic Script ManipulationCode1
Question Answering Classification for Amharic Social Media Community Based QuestionsCode1
A machine transliteration tool between Uzbek alphabetsCode1
ParaNames: A Massively Multilingual Entity Name CorpusCode1
Sub-Character Tokenization for Chinese Pretrained Language ModelsCode1
KLPT – Kurdish Language Processing ToolkitCode1
Leveraging Multilingual News Websites for Building a Kurdish Parallel CorpusCode1
Processing South Asian Languages Written in the Latin Script: the Dakshina DatasetCode1
Applying the Transformer to Character-level TransductionCode1
An Ensemble Model of Word-based and Character-based Models for Japanese and Chinese Input MethodCode1
Optimizing Multilingual Text-To-Speech with Accents & Emotions0
Rethinking Hate Speech Detection on Social Media: Can LLMs Replace Traditional Models?0
Beyond Specialization: Benchmarking LLMs for Transliteration of Indian Languages0
Lost in Transliteration: Bridging the Script Gap in Neural IR0
Bridging the Gap: An Intermediate Language for Enhanced and Cost-Effective Grapheme-to-Phoneme Conversion with Homographs with Multiple Pronunciations Disambiguation0
Proper Name Diacritization for Arabic Wikipedia: A Benchmark Dataset0
Low-Resource Transliteration for Roman-Urdu and Urdu Using Transformer-Based Models0
Connecting the Persian-speaking World through Transliteration0
NusaAksara: A Multimodal and Multilingual Benchmark for Preserving Indonesian Indigenous Scripts0
Linguistic Analysis of Sinhala YouTube Comments on Sinhala Music Videos: A Dataset Study0
IndoNLP 2025: Shared Task on Real-Time Reverse Transliteration for Romanized Indo-Aryan languages0
When LLMs Struggle: Reference-less Translation Evaluation for Low-resource Languages0
Sinhala Transliteration: A Comparative Analysis Between Rule-based and Seq2Seq ApproachesCode0
LAMA-UT: Language Agnostic Multilingual ASR through Orthography Unification and Language-Specific Transliteration0
Transliterated Zero-Shot Domain Adaptation for Automatic Speech Recognition0
Romanized to Native Malayalam Script Transliteration Using an Encoder-Decoder FrameworkCode0
PolyIPA -- Multilingual Phoneme-to-Grapheme Conversion Model0
AyutthayaAlpha: A Thai-Latin Script Transliteration Transformer0
Towards Cross-Cultural Machine Translation with Retrieval-Augmented Generation from Multilingual Knowledge Graphs0
ChakmaNMT: A Low-resource Machine Translation On Chakma Language0
UniGlyph: A Seven-Segment Script for Universal Language Representation0
A two-stage transliteration approach to improve performance of a multilingual ASR0
How Transliterations Improve Crosslingual AlignmentCode0
Exploring the Role of Transliteration in In-Context Learning for Low-resource Languages Written in Non-Latin Scripts0
Can Small Language Models Learn, Unlearn, and Retain Noise Patterns?Code0
Breaking the Script Barrier in Multilingual Pre-Trained Language Models with Transliteration-Based Post-Training AlignmentCode0
Jailbreaking LLMs with Arabic Transliteration and ArabiziCode0
Review of Computational Epigraphy0
TransMI: A Framework to Create Strong Baselines from Multilingual Pretrained Language Models for Transliterated DataCode0
Swa Bhasha: Message-Based Singlish to Sinhala Transliteration0
Charles Translator: A Machine Translation System between Ukrainian and Czech0
Show:102550
← PrevPage 1 of 9Next →

No leaderboard results yet.