| Investigating model performance in language identification: beyond simple error statistics | May 30, 2023 | Language Identification | CodeCode Available | 0 |
| MERLIon CCS Challenge: A English-Mandarin code-switching child-directed speech corpus for language identification and diarization | May 30, 2023 | Language Identification | CodeCode Available | 0 |
| Script Normalization for Unconventional Writing of Under-Resourced Languages in Bilingual Communities | May 25, 2023 | Language IdentificationMachine Translation | CodeCode Available | 0 |
| Bhasha-Abhijnaanam: Native-script and romanized Language Identification for 22 Indic languages | May 25, 2023 | Language Identification | CodeCode Available | 1 |
| An Open Dataset and Model for Language Identification | May 23, 2023 | Language Identificationmodel | CodeCode Available | 1 |
| LIMIT: Language Identification, Misidentification, and Translation using Hierarchical Models in 350+ Languages | May 23, 2023 | Language IdentificationTranslation | CodeCode Available | 0 |
| Multilingual Large Language Models Are Not (Yet) Code-Switchers | May 23, 2023 | BenchmarkingLanguage Identification | —Unverified | 0 |
| Scaling Speech Technology to 1,000+ Languages | May 22, 2023 | Automatic Speech RecognitionLanguage Identification | CodeCode Available | 1 |
| ML-SUPERB: Multilingual Speech Universal PERformance Benchmark | May 18, 2023 | Automatic Speech RecognitionLanguage Identification | —Unverified | 0 |
| DocLangID: Improving Few-Shot Training to Identify the Language of Historical Documents | May 3, 2023 | Few-Shot LearningLanguage Identification | CodeCode Available | 0 |
| Lessons Learned in ATCO2: 5000 hours of Air Traffic Control Communications for Robust Automatic Speech Recognition and Understanding | May 2, 2023 | Automatic Speech RecognitionLanguage Identification | —Unverified | 0 |
| Approaches to Corpus Creation for Low-Resource Language Technology: the Case of Southern Kurdish and Laki | Apr 3, 2023 | Language Identification | CodeCode Available | 0 |
| PALI: A Language Identification Benchmark for Perso-Arabic Scripts | Apr 3, 2023 | Language Identification | CodeCode Available | 1 |
| MMT: A Multilingual and Multi-Topic Indian Social Media Dataset | Apr 2, 2023 | DiversityLanguage Identification | —Unverified | 0 |
| Joint unsupervised and supervised learning for context-aware language identification | Mar 29, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Language Variety Identification with True Labels | Mar 2, 2023 | Language Identification | CodeCode Available | 0 |
| Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training | Mar 1, 2023 | Language Identification | —Unverified | 0 |
| Augmented Transformers with Adaptive n-grams Embedding for Multilingual Scene Text Recognition | Feb 28, 2023 | Language IdentificationScene Text Recognition | —Unverified | 0 |
| Language identification as improvement for lip-based biometric visual systems | Feb 27, 2023 | Language Identification | —Unverified | 0 |
| Improving Spoken Language Identification with Map-Mix | Feb 16, 2023 | Data AugmentationLanguage Identification | CodeCode Available | 1 |
| Cross-Corpora Spoken Language Identification with Domain Diversification and Generalization | Feb 10, 2023 | Data AugmentationDomain Generalization | —Unverified | 0 |
| A Twitter BERT Approach for Offensive Language Detection in Marathi | Dec 20, 2022 | Data AugmentationLanguage Identification | —Unverified | 0 |
| SOLD: Sinhala Offensive Language Dataset | Dec 1, 2022 | Language IdentificationSentence | CodeCode Available | 1 |
| An Overview of Indian Spoken Language Recognition from Machine Learning Perspective | Nov 30, 2022 | Language IdentificationSpoken language identification | —Unverified | 0 |
| Transformer-based Model for Word Level Language Identification in Code-mixed Kannada-English Texts | Nov 26, 2022 | Language Identification | —Unverified | 0 |
| Predicting the Type and Target of Offensive Social Media Posts in Marathi | Nov 22, 2022 | Language Identification | CodeCode Available | 0 |
| Overview of the HASOC Subtrack at FIRE 2022: Offensive Language Identification in Marathi | Nov 18, 2022 | Language Identification | —Unverified | 0 |
| Scaling Native Language Identification with Transformer Adapters | Nov 18, 2022 | Language IdentificationMarketing | —Unverified | 0 |
| CoLI-Machine Learning Approaches for Code-mixed Language Identification at the Word Level in Kannada-English Texts | Nov 17, 2022 | Language IdentificationSentence | —Unverified | 0 |
| Accidental Learners: Spoken Language Identification in Multilingual Self-Supervised Models | Nov 9, 2022 | Language IdentificationSpoken language identification | —Unverified | 0 |
| LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers | Nov 5, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| A Compact End-to-End Model with Local and Global Context for Spoken Language Identification | Oct 27, 2022 | Language IdentificationSpoken language identification | —Unverified | 0 |
| AfroLID: A Neural Language Identification Tool for African Languages | Oct 21, 2022 | Language Identification | CodeCode Available | 1 |
| Italian Language and Dialect Identification and Regional French Variety Detection using Adaptive Naive Bayes | Oct 1, 2022 | Dialect IdentificationLanguage Identification | CodeCode Available | 0 |
| Neural Networks for Cross-domain Language Identification. Phlyers @Vardial 2022 | Oct 1, 2022 | Language Identification | —Unverified | 0 |
| The Curious Case of Logistic Regression for Italian Languages and Dialects Identification | Oct 1, 2022 | Language IdentificationMachine Translation | CodeCode Available | 0 |
| OcWikiDisc: a Corpus of Wikipedia Talk Pages in Occitan | Oct 1, 2022 | 8kLanguage Identification | —Unverified | 0 |
| The first neural machine translation system for the Erzya language | Sep 19, 2022 | Language IdentificationMachine Translation | CodeCode Available | 1 |
| Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification | Sep 13, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Evaluation of Off-the-Shelf Language Identification Tools on Bulgarian Social Media Posts | Sep 1, 2022 | Language Identification | —Unverified | 0 |
| IndicSUPERB: A Speech Processing Universal Performance Benchmark for Indian languages | Aug 24, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Unravelling Interlanguage Facts via Explainable Machine Learning | Aug 2, 2022 | BIG-bench Machine LearningLanguage Identification | —Unverified | 0 |
| Extending RNN-T-based speech recognition systems with emotion and language classification | Jul 28, 2022 | Emotion ClassificationEmotion Recognition | —Unverified | 0 |
| Distilled Non-Semantic Speech Embeddings with Binary Neural Networks for Low-Resource Devices | Jul 12, 2022 | Emotion RecognitionKeyword Spotting | CodeCode Available | 0 |
| Huqariq: A Multilingual Speech Corpus of Native Languages of Peru for Speech Recognition | Jul 12, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| TechSSN at SemEval-2022 Task 6: Intended Sarcasm Detection using Transformer Models | Jul 1, 2022 | Language IdentificationSarcasm Detection | —Unverified | 0 |
| TweetNLP: Cutting-Edge Natural Language Processing for Social Media | Jun 29, 2022 | Language IdentificationNamed Entity Recognition | CodeCode Available | 2 |
| Language Identification for Austronesian Languages | Jun 9, 2022 | Language Identification | CodeCode Available | 0 |
| Word-level Language Identification Using Subword Embeddings for Code-mixed Bangla-English Social Media Data | Jun 1, 2022 | Language IdentificationPOS | CodeCode Available | 1 |
| CoSwID, a Code Switching Identification Method Suitable for Under-Resourced Languages | Jun 1, 2022 | Language Identification | —Unverified | 0 |