Transliteration

Transliteration is a mechanism for converting a word in a source (foreign) language to a target language, and often adopts approaches from machine translation. In machine translation, the objective is to preserve the semantic meaning of the utterance as much as possible while following the syntactic structure in the target language. In Transliteration, the objective is to preserve the original pronunciation of the source word as much as possible while following the phonological structures of the target language.

For example, the city’s name “Manchester” has become well known by people of languages other than English. These new words are often named entities that are important in cross-lingual information retrieval, information extraction, machine translation, and often present out-of-vocabulary challenges to spoken language technologies such as automatic speech recognition, spoken keyword search, and text-to-speech.

Source: Phonology-Augmented Statistical Framework for Machine Transliteration using Limited Linguistic Resources

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 435 papers

Title	Date	Tasks	Status	Hype	Score
GR-NLP-TOOLKIT: An Open-Source NLP Toolkit for Modern Greek	Dec 11, 2024	Dependency ParsingMorphological Tagging	CodeCode Available	2	5
Aksharantar: Open Indic-language Transliteration datasets and models for the Next Billion Users	May 6, 2022	Transliteration	CodeCode Available	2	5
Taqyim: Evaluating Arabic NLP Tasks Using ChatGPT Models	Jun 28, 2023	Part-Of-Speech TaggingSentiment Analysis	CodeCode Available	1	5
An Ensemble Model of Word-based and Character-based Models for Japanese and Chinese Input Method	Dec 1, 2012	Transliteration	CodeCode Available	1	5
Question Answering Classification for Amharic Social Media Community Based Questions	Jun 1, 2022	8kQuestion Answering	CodeCode Available	1	5
ParaNames: A Massively Multilingual Entity Name Corpus	Feb 28, 2022	named-entity-recognitionNamed Entity Recognition	CodeCode Available	1	5
Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration	May 25, 2023	Speech Synthesistext-to-speech	CodeCode Available	1	5
Show Me the World in My Language: Establishing the First Baseline for Scene-Text to Scene-Text Translation	Aug 6, 2023	Machine TranslationScene Text Editing	CodeCode Available	1	5
A machine transliteration tool between Uzbek alphabets	May 19, 2022	Transliteration	CodeCode Available	1	5
XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages	May 19, 2023	In-Context LearningMultilingual NLP	CodeCode Available	1	5
ParaNames 1.0: Creating an Entity Name Corpus for 400+ Languages using Wikidata	May 15, 2024	Multilingual Named Entity Recognitionnamed-entity-recognition	CodeCode Available	1	5
Processing South Asian Languages Written in the Latin Script: the Dakshina Dataset	Jul 2, 2020	Language ModelingLanguage Modelling	CodeCode Available	1	5
Sub-Character Tokenization for Chinese Pretrained Language Models	Jun 1, 2021	Chinese Word SegmentationComputational Efficiency	CodeCode Available	1	5
Leveraging Multilingual News Websites for Building a Kurdish Parallel Corpus	Oct 4, 2020	ArticlesMachine Translation	CodeCode Available	1	5
DeepScribe: Localization and Classification of Elamite Cuneiform Signs Via Deep Learning	Jun 2, 2023	Transliteration	CodeCode Available	1	5
ParsiPy: NLP Toolkit for Historical Persian Texts in Python	Mar 22, 2025	LemmatizationPart-Of-Speech Tagging	CodeCode Available	1	5
KLPT – Kurdish Language Processing Toolkit	Nov 1, 2020	DiversityLemmatization	CodeCode Available	1	5
Applying the Transformer to Character-level Transduction	May 20, 2020	Grapheme-to-Phoneme ConversionMorphological Inflection	CodeCode Available	1	5
Beyond Arabic: Software for Perso-Arabic Script Manipulation	Jan 26, 2023	Transliteration	CodeCode Available	1	5
Specializing Multilingual Language Models: An Empirical Study	Jun 16, 2021	Dependency Parsingnamed-entity-recognition	CodeCode Available	0	5
Sequence-to-sequence neural network models for transliteration	Oct 29, 2016	Machine TranslationTranslation	CodeCode Available	0	5
Sinhala Transliteration: A Comparative Analysis Between Rule-based and Seq2Seq Approaches	Dec 31, 2024	DecoderMachine Translation	CodeCode Available	0	5
Role of Language Relatedness in Multilingual Fine-tuning of Language Models: A Case Study in Indo-Aryan Languages	Sep 22, 2021	Multiple Choice Question Answering (MCQA)Natural Language Inference	CodeCode Available	0	5
A Multi-cascaded Deep Model for Bilingual SMS Classification	Nov 29, 2019	ClassificationGeneral Classification	CodeCode Available	0	5
Orthographic Transliteration for Kabyle Speech Recognition	Nov 1, 2021	speech-recognitionSpeech Recognition	CodeCode Available	0	5
On Biasing Transformer Attention Towards Monotonicity	Apr 8, 2021	Grapheme-to-Phoneme ConversionMorphological Inflection	CodeCode Available	0	5
Romanized to Native Malayalam Script Transliteration Using an Encoder-Decoder Framework	Dec 13, 2024	DecoderTransliteration	CodeCode Available	0	5
Towards Offensive Language Identification for Dravidian Languages	Apr 1, 2021	Few-Shot LearningLanguage Identification	CodeCode Available	0	5
Towards Offensive Language Identification for Tamil Code-Mixed YouTube Comments and Posts	Aug 24, 2021	Language IdentificationTransfer Learning	CodeCode Available	0	5
An Empirical Study of Chinese Name Matching and Applications	Jul 1, 2015	Coreference ResolutionEntity Linking	CodeCode Available	0	5
How Transliterations Improve Crosslingual Alignment	Sep 25, 2024	SentenceTransliteration	CodeCode Available	0	5
Jailbreaking LLMs with Arabic Transliteration and Arabizi	Jun 26, 2024	Transliteration	CodeCode Available	0	5
A Large-scale Evaluation of Neural Machine Transliteration for Indic Languages	Apr 1, 2021	TranslationTransliteration	CodeCode Available	0	5
Exploiting Language Relatedness for Low Web-Resource Language Model Adaptation: An Indic Languages Study	Jun 7, 2021	Data AugmentationLanguage Modeling	CodeCode Available	0	5
How Grammatical is Character-level Neural Machine Translation? Assessing MT Quality with Contrastive Translation Pairs	Dec 14, 2016	Machine TranslationNMT	CodeCode Available	0	5
Neural Machine Translation Techniques for Named Entity Transliteration	Jul 1, 2018	Automatic Post-EditingDecoder	CodeCode Available	0	5
Cross-Lingual Text Classification of Transliterated Hindi and Malayalam	Aug 31, 2021	BenchmarkingClassification	CodeCode Available	0	5
Design Challenges in Named Entity Transliteration	Aug 7, 2018	DecoderTransliteration	CodeCode Available	0	5
Creating a Translation Matrix of the Bible's Names Across 591 Languages	May 1, 2018	Entity AlignmentMachine Translation	CodeCode Available	0	5
Event detection in Twitter: A keyword volume approach	Jan 3, 2019	Binary ClassificationEvent Detection	CodeCode Available	0	5
Context Independent Term Mapper for European Languages	Sep 1, 2013	Information RetrievalMachine Translation	CodeCode Available	0	5
Creating Large-Scale Multilingual Cognate Tables	May 1, 2018	Machine TranslationSemantic Textual Similarity	CodeCode Available	0	5
Does Transliteration Help Multilingual Language Modeling?	Jan 29, 2022	DiversityLanguage Modeling	CodeCode Available	0	5
IIITT@Dravidian-CodeMix-FIRE2021: Transliterate or translate? Sentiment analysis of code-mixed text in Dravidian languages	Nov 15, 2021	MarketingSentiment Analysis	CodeCode Available	0	5
Bootstrapping Transliteration with Constrained Discovery for Low-Resource Languages	Sep 20, 2018	Entity LinkingTransliteration	CodeCode Available	0	5
Bilingual dictionaries for all EU languages	May 1, 2014	AllMachine Translation	CodeCode Available	0	5
Breaking the Script Barrier in Multilingual Pre-Trained Language Models with Transliteration-Based Post-Training Alignment	Jun 28, 2024	Cross-Lingual TransferTransliteration	CodeCode Available	0	5
A Rule-based Kurdish Text Transliteration System	Nov 26, 2018	Transliteration	CodeCode Available	0	5
Can Small Language Models Learn, Unlearn, and Retain Noise Patterns?	Jul 1, 2024	In-Context LearningTransliteration	CodeCode Available	0	5
Efficient Sequence Labeling with Actor-Critic Training	Sep 30, 2018	Decision MakingNER	CodeCode Available	0	5

Show:10 25 50

← PrevPage 1 of 9Next →

No leaderboard results yet.