SOTAVerified

Transliteration

Transliteration is a mechanism for converting a word in a source (foreign) language to a target language, and often adopts approaches from machine translation. In machine translation, the objective is to preserve the semantic meaning of the utterance as much as possible while following the syntactic structure in the target language. In Transliteration, the objective is to preserve the original pronunciation of the source word as much as possible while following the phonological structures of the target language.

For example, the city’s name “Manchester” has become well known by people of languages other than English. These new words are often named entities that are important in cross-lingual information retrieval, information extraction, machine translation, and often present out-of-vocabulary challenges to spoken language technologies such as automatic speech recognition, spoken keyword search, and text-to-speech.

Source: Phonology-Augmented Statistical Framework for Machine Transliteration using Limited Linguistic Resources

Papers

Showing 125 of 435 papers

TitleStatusHype
Aksharantar: Open Indic-language Transliteration datasets and models for the Next Billion UsersCode2
GR-NLP-TOOLKIT: An Open-Source NLP Toolkit for Modern GreekCode2
ParaNames: A Massively Multilingual Entity Name CorpusCode1
Sub-Character Tokenization for Chinese Pretrained Language ModelsCode1
Show Me the World in My Language: Establishing the First Baseline for Scene-Text to Scene-Text TranslationCode1
Processing South Asian Languages Written in the Latin Script: the Dakshina DatasetCode1
Multilingual Text-to-Speech Synthesis for Turkic Languages Using TransliterationCode1
Applying the Transformer to Character-level TransductionCode1
ParsiPy: NLP Toolkit for Historical Persian Texts in PythonCode1
Question Answering Classification for Amharic Social Media Community Based QuestionsCode1
DeepScribe: Localization and Classification of Elamite Cuneiform Signs Via Deep LearningCode1
XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented LanguagesCode1
KLPT – Kurdish Language Processing ToolkitCode1
A machine transliteration tool between Uzbek alphabetsCode1
ParaNames 1.0: Creating an Entity Name Corpus for 400+ Languages using WikidataCode1
An Ensemble Model of Word-based and Character-based Models for Japanese and Chinese Input MethodCode1
Leveraging Multilingual News Websites for Building a Kurdish Parallel CorpusCode1
Beyond Arabic: Software for Perso-Arabic Script ManipulationCode1
Taqyim: Evaluating Arabic NLP Tasks Using ChatGPT ModelsCode1
Event detection in Twitter: A keyword volume approachCode0
Efficient Sequence Labeling with Actor-Critic TrainingCode0
Exploiting Language Relatedness for Low Web-Resource Language Model Adaptation: An Indic Languages StudyCode0
Design Challenges in Named Entity TransliterationCode0
A Multi-cascaded Deep Model for Bilingual SMS ClassificationCode0
Cross-Lingual Text Classification of Transliterated Hindi and MalayalamCode0
Show:102550
← PrevPage 1 of 18Next →

No leaderboard results yet.