GR-NLP-TOOLKIT: An Open-Source NLP Toolkit for Modern Greek Dec 11, 2024 Dependency Parsing Morphological Tagging
Code Code Available 2Aksharantar: Open Indic-language Transliteration datasets and models for the Next Billion Users May 6, 2022 Transliteration
Code Code Available 2ParsiPy: NLP Toolkit for Historical Persian Texts in Python Mar 22, 2025 Lemmatization Part-Of-Speech Tagging
Code Code Available 1ParaNames 1.0: Creating an Entity Name Corpus for 400+ Languages using Wikidata May 15, 2024 Multilingual Named Entity Recognition named-entity-recognition
Code Code Available 1Show Me the World in My Language: Establishing the First Baseline for Scene-Text to Scene-Text Translation Aug 6, 2023 Machine Translation Scene Text Editing
Code Code Available 1Taqyim: Evaluating Arabic NLP Tasks Using ChatGPT Models Jun 28, 2023 Part-Of-Speech Tagging Sentiment Analysis
Code Code Available 1DeepScribe: Localization and Classification of Elamite Cuneiform Signs Via Deep Learning Jun 2, 2023 Transliteration
Code Code Available 1Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration May 25, 2023 Speech Synthesis text-to-speech
Code Code Available 1XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages May 19, 2023 In-Context Learning Multilingual NLP
Code Code Available 1Beyond Arabic: Software for Perso-Arabic Script Manipulation Jan 26, 2023 Transliteration
Code Code Available 1Question Answering Classification for Amharic Social Media Community Based Questions Jun 1, 2022 8k Question Answering
Code Code Available 1A machine transliteration tool between Uzbek alphabets May 19, 2022 Transliteration
Code Code Available 1ParaNames: A Massively Multilingual Entity Name Corpus Feb 28, 2022 named-entity-recognition Named Entity Recognition
Code Code Available 1Sub-Character Tokenization for Chinese Pretrained Language Models Jun 1, 2021 Chinese Word Segmentation Computational Efficiency
Code Code Available 1KLPT – Kurdish Language Processing Toolkit Nov 1, 2020 Diversity Lemmatization
Code Code Available 1Leveraging Multilingual News Websites for Building a Kurdish Parallel Corpus Oct 4, 2020 Articles Machine Translation
Code Code Available 1Processing South Asian Languages Written in the Latin Script: the Dakshina Dataset Jul 2, 2020 Language Modeling Language Modelling
Code Code Available 1Applying the Transformer to Character-level Transduction May 20, 2020 Grapheme-to-Phoneme Conversion Morphological Inflection
Code Code Available 1An Ensemble Model of Word-based and Character-based Models for Japanese and Chinese Input Method Dec 1, 2012 Transliteration
Code Code Available 1Optimizing Multilingual Text-To-Speech with Accents & Emotions Jun 19, 2025 Disentanglement Emotion Recognition
— Unverified 0Rethinking Hate Speech Detection on Social Media: Can LLMs Replace Traditional Models? Jun 15, 2025 Hate Speech Detection Transliteration
— Unverified 0Beyond Specialization: Benchmarking LLMs for Transliteration of Indian Languages May 26, 2025 Benchmarking Transliteration
— Unverified 0Lost in Transliteration: Bridging the Script Gap in Neural IR May 13, 2025 Information Retrieval Retrieval
— Unverified 0Bridging the Gap: An Intermediate Language for Enhanced and Cost-Effective Grapheme-to-Phoneme Conversion with Homographs with Multiple Pronunciations Disambiguation May 10, 2025 Grapheme-to-Phoneme Conversion Large Language Model
— Unverified 0Proper Name Diacritization for Arabic Wikipedia: A Benchmark Dataset May 5, 2025 Transliteration
— Unverified 0Low-Resource Transliteration for Roman-Urdu and Urdu Using Transformer-Based Models Mar 27, 2025 Information Retrieval Language Modeling
— Unverified 0Connecting the Persian-speaking World through Transliteration Feb 27, 2025 Machine Translation Transliteration
— Unverified 0NusaAksara: A Multimodal and Multilingual Benchmark for Preserving Indonesian Indigenous Scripts Feb 25, 2025 Image Segmentation Language Identification
— Unverified 0Linguistic Analysis of Sinhala YouTube Comments on Sinhala Music Videos: A Dataset Study Jan 28, 2025 Emotion Recognition Information Retrieval
— Unverified 0IndoNLP 2025: Shared Task on Real-Time Reverse Transliteration for Romanized Indo-Aryan languages Jan 10, 2025 Transliteration
— Unverified 0When LLMs Struggle: Reference-less Translation Evaluation for Low-resource Languages Jan 8, 2025 Machine Translation Transliteration
— Unverified 0Sinhala Transliteration: A Comparative Analysis Between Rule-based and Seq2Seq Approaches Dec 31, 2024 Decoder Machine Translation
Code Code Available 0LAMA-UT: Language Agnostic Multilingual ASR through Orthography Unification and Language-Specific Transliteration Dec 19, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Transliterated Zero-Shot Domain Adaptation for Automatic Speech Recognition Dec 15, 2024 Automatic Speech Recognition Domain Adaptation
— Unverified 0Romanized to Native Malayalam Script Transliteration Using an Encoder-Decoder Framework Dec 13, 2024 Decoder Transliteration
Code Code Available 0PolyIPA -- Multilingual Phoneme-to-Grapheme Conversion Model Dec 12, 2024 Data Augmentation Information Retrieval
— Unverified 0AyutthayaAlpha: A Thai-Latin Script Transliteration Transformer Dec 5, 2024 Cross-Lingual Information Retrieval Information Retrieval
— Unverified 0Towards Cross-Cultural Machine Translation with Retrieval-Augmented Generation from Multilingual Knowledge Graphs Oct 17, 2024 Knowledge Graphs Machine Translation
— Unverified 0ChakmaNMT: A Low-resource Machine Translation On Chakma Language Oct 14, 2024 Benchmarking Machine Translation
— Unverified 0UniGlyph: A Seven-Segment Script for Universal Language Representation Oct 11, 2024 Diversity speech-recognition
— Unverified 0A two-stage transliteration approach to improve performance of a multilingual ASR Oct 9, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0How Transliterations Improve Crosslingual Alignment Sep 25, 2024 Sentence Transliteration
Code Code Available 0Exploring the Role of Transliteration in In-Context Learning for Low-resource Languages Written in Non-Latin Scripts Jul 2, 2024 Decoder In-Context Learning
— Unverified 0Can Small Language Models Learn, Unlearn, and Retain Noise Patterns? Jul 1, 2024 In-Context Learning Transliteration
Code Code Available 0Breaking the Script Barrier in Multilingual Pre-Trained Language Models with Transliteration-Based Post-Training Alignment Jun 28, 2024 Cross-Lingual Transfer Transliteration
Code Code Available 0Jailbreaking LLMs with Arabic Transliteration and Arabizi Jun 26, 2024 Transliteration
Code Code Available 0Review of Computational Epigraphy Jun 3, 2024 Attribute Transliteration
— Unverified 0TransMI: A Framework to Create Strong Baselines from Multilingual Pretrained Language Models for Transliterated Data May 16, 2024 Transliteration
Code Code Available 0Swa Bhasha: Message-Based Singlish to Sinhala Transliteration Apr 20, 2024 Transliteration
— Unverified 0Charles Translator: A Machine Translation System between Ukrainian and Czech Apr 10, 2024 Machine Translation Translation
— Unverified 0