| FastSpell: the LangId Magic Spell | Apr 12, 2024 | Language Identification | CodeCode Available | 1 |
| What is Learnt by the LEArnable Front-end (LEAF)? Adapting Per-Channel Energy Normalisation (PCEN) to Noisy Conditions | Apr 10, 2024 | Emotion RecognitionKeyword Spotting | CodeCode Available | 0 |
| Geographically-Informed Language Identification | Mar 14, 2024 | Language Identification | CodeCode Available | 0 |
| More than words: Advancements and challenges in speech recognition for singing | Mar 14, 2024 | Keyword SpottingLanguage Identification | —Unverified | 0 |
| Validating and Exploring Large Geographic Corpora | Mar 13, 2024 | Language IdentificationOutlier Detection | —Unverified | 0 |
| Aligning Speech to Languages to Enhance Code-switching Speech Recognition | Mar 9, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Language and Speech Technology for Central Kurdish Varieties | Mar 4, 2024 | Automatic Speech RecognitionDiversity | CodeCode Available | 1 |
| KInIT at SemEval-2024 Task 8: Fine-tuned LLMs for Multilingual Machine-Generated Text Detection | Feb 21, 2024 | Language Identificationparameter-efficient fine-tuning | CodeCode Available | 1 |
| OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification | Feb 20, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| Code-Switched Language Identification is Harder Than You Think | Feb 2, 2024 | Language IdentificationSentence | CodeCode Available | 0 |
| Detecting Structured Language Alternations in Historical Documents by Combining Language Identification with Fourier Analysis | Jan 25, 2024 | Language Identification | —Unverified | 0 |
| Acoustic characterization of speech rhythm: going beyond metrics with recurrent neural networks | Jan 22, 2024 | Language IdentificationRhythm | —Unverified | 0 |
| AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters | Jan 12, 2024 | Language Identification | CodeCode Available | 1 |
| Language Detection for Transliterated Content | Jan 9, 2024 | Language IdentificationTransliteration | —Unverified | 0 |
| MathPile: A Billion-Token-Scale Pretraining Corpus for Math | Dec 28, 2023 | Language IdentificationMath | CodeCode Available | 2 |
| Generative linguistic representation for spoken language identification | Dec 18, 2023 | DecoderLanguage Identification | —Unverified | 0 |
| Cross-Linguistic Offensive Language Detection: BERT-Based Analysis of Bengali, Assamese, & Bodo Conversational Hateful Content from Social Media | Dec 16, 2023 | Language Identification | —Unverified | 0 |
| Leveraging Language ID to Calculate Intermediate CTC Loss for Enhanced Code-Switching Speech Recognition | Dec 15, 2023 | Automatic Speech RecognitionLanguage Identification | —Unverified | 0 |
| Attention-Guided Adaptation for Code-Switching Speech Recognition | Dec 14, 2023 | Language Identificationspeech-recognition | —Unverified | 0 |
| Native Language Identification with Large Language Models | Dec 13, 2023 | Language AcquisitionLanguage Identification | —Unverified | 0 |
| Self-supervised Adaptive Pre-training of Multilingual Speech Models for Language and Dialect Identification | Dec 12, 2023 | Automatic Speech RecognitionDialect Identification | —Unverified | 0 |
| A Text-to-Text Model for Multilingual Offensive Language Identification | Dec 6, 2023 | DecoderLanguage Identification | —Unverified | 0 |
| Offensive Language Identification in Transliterated and Code-Mixed Bangla | Nov 25, 2023 | Language Identification | —Unverified | 0 |
| The Obscure Limitation of Modular Multilingual Language Models | Nov 21, 2023 | Language Identification | —Unverified | 0 |
| Fumbling in Babel: An Investigation into ChatGPT's Language Identification Ability | Nov 16, 2023 | Language Identification | —Unverified | 0 |
| OffMix-3L: A Novel Code-Mixed Dataset in Bangla-English-Hindi for Offensive Language Identification | Oct 27, 2023 | Language Identification | CodeCode Available | 0 |
| GlotLID: Language Identification for Low-Resource Languages | Oct 24, 2023 | Dialect IdentificationLanguage Identification | CodeCode Available | 1 |
| Advanced accent/dialect identification and accentedness assessment with multi-embedding models and automatic speech recognition | Oct 17, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond | Oct 9, 2023 | Language Identificationspeech-recognition | —Unverified | 0 |
| Wavelet Scattering Transform for Improving Generalization in Low-Resourced Spoken Language Identification | Oct 1, 2023 | Language IdentificationSpoken language identification | —Unverified | 0 |
| Multimodal Modeling For Spoken Language Identification | Sep 19, 2023 | Language IdentificationSpoken language identification | —Unverified | 0 |
| CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages | Sep 17, 2023 | HallucinationLanguage Identification | —Unverified | 0 |
| Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper | Sep 15, 2023 | Language Identificationspeech-recognition | CodeCode Available | 1 |
| Native Language Identification with Big Bird Embeddings | Sep 13, 2023 | Computational EfficiencyFeature Engineering | CodeCode Available | 0 |
| Robust Open-Set Spoken Language Identification and the CU MultiLang Dataset | Aug 29, 2023 | Language IdentificationSpoken language identification | —Unverified | 0 |
| Fine-Tuning Llama 2 Large Language Models for Detecting Online Sexual Predatory Chats and Abusive Texts | Aug 28, 2023 | Abusive LanguageFake News Detection | —Unverified | 0 |
| Bilingual Streaming ASR with Grapheme units and Auxiliary Monolingual Loss | Aug 11, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Turkish Native Language Identification | Jul 27, 2023 | Language IdentificationNative Language Identification | —Unverified | 0 |
| MASR: Multi-label Aware Speech Representation | Jul 20, 2023 | Emotion RecognitionLanguage Identification | —Unverified | 0 |
| Multilingual Speech-to-Speech Translation into Multiple Target Languages | Jul 17, 2023 | Language IdentificationSpeech-to-Speech Translation | —Unverified | 0 |
| Towards spoken dialect identification of Irish | Jul 14, 2023 | Dialect IdentificationLanguage Identification | —Unverified | 0 |
| Confidence-based Ensembles of End-to-End Speech Recognition Models | Jun 27, 2023 | Language IdentificationModel Selection | —Unverified | 0 |
| My Boli: Code-mixed Marathi-English Corpora, Pretrained Language Models and Evaluation Benchmarks | Jun 24, 2023 | BenchmarkingHate Speech Detection | CodeCode Available | 0 |
| Unified model for code-switching speech recognition and language identification based on a concatenated tokenizer | Jun 14, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| RoBERTweet: A BERT Language Model for Romanian Tweets | Jun 11, 2023 | Language IdentificationLanguage Modeling | —Unverified | 0 |
| Leveraging Language Identification to Enhance Code-Mixed Text Classification | Jun 8, 2023 | ClassificationHate Speech Detection | —Unverified | 0 |
| Label Aware Speech Representation Learning For Language Identification | Jun 7, 2023 | Language IdentificationMissing Labels | —Unverified | 0 |
| Spoken Language Identification System for English-Mandarin Code-Switching Child-Directed Speech | Jun 1, 2023 | DecoderLanguage Identification | CodeCode Available | 0 |
| Simple yet Effective Code-Switching Language Identification with Multitask Pre-Training and Transfer Learning | May 31, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| MERLIon CCS Challenge Evaluation Plan | May 31, 2023 | Language IdentificationTask 2 | CodeCode Available | 0 |