| mSTEB: Massively Multilingual Evaluation of LLMs on Speech and Text Tasks | Jun 10, 2025 | Language IdentificationQuestion Answering | —Unverified | 0 |
| Neighbors and relatives: How do speech embeddings reflect linguistic connections across the world? | Jun 10, 2025 | Language Identification | —Unverified | 0 |
| Recursive Semantic Anchoring in ISO 639:2023: A Structural Extension to ISO/TC 37 Frameworks | Jun 7, 2025 | Language Identification | —Unverified | 0 |
| TalTech Systems for the Interspeech 2025 ML-SUPERB 2.0 Challenge | Jun 2, 2025 | Language Identificationspeech-recognition | —Unverified | 0 |
| Improving Multilingual Speech Models on ML-SUPERB 2.0: Fine-tuning with Data Augmentation and LID-Aware CTC | May 30, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training | May 23, 2025 | Automatic Speech RecognitionEmotion Recognition | CodeCode Available | 11 |
| Token Masking Improves Transformer-Based Text Classification | May 16, 2025 | AttributeClassification | —Unverified | 0 |
| Advancing Uto-Aztecan Language Technologies: A Case Study on the Endangered Comanche Language | May 10, 2025 | Language IdentificationSynthetic Data Generation | CodeCode Available | 0 |
| Improving Informally Romanized Language Identification | Apr 30, 2025 | Language Identification | —Unverified | 0 |
| (Im)possibility of Automated Hallucination Detection in Large Language Models | Apr 23, 2025 | HallucinationLanguage Identification | —Unverified | 0 |
| COMI-LINGUA: Expert Annotated Large-Scale Dataset for Multitask NLP in Hindi-English Code-Mixing | Mar 27, 2025 | Language Identificationnamed-entity-recognition | —Unverified | 0 |
| KréyoLID From Language Identification Towards Language Mining | Mar 9, 2025 | Language IdentificationMulti-class Classification | CodeCode Available | 0 |
| NusaAksara: A Multimodal and Multilingual Benchmark for Preserving Indonesian Indigenous Scripts | Feb 25, 2025 | Image SegmentationLanguage Identification | —Unverified | 0 |
| English Please: Evaluating Machine Translation with Large Language Models for Multilingual Bug Reports | Feb 20, 2025 | Domain AdaptationLanguage Identification | CodeCode Available | 0 |
| Multi-label Scandinavian Language Identification (SLIDE) | Feb 10, 2025 | Language IdentificationSentence | CodeCode Available | 0 |
| On the use of Performer and Agent Attention for Spoken Language Identification | Feb 9, 2025 | Language IdentificationSelf-Supervised Learning | —Unverified | 0 |
| Evaluating Standard and Dialectal Frisian ASR: Multilingual Fine-tuning and Language Identification for Improved Low-resource Performance | Feb 7, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Is It Navajo? Accurate Language Detection in Endangered Athabaskan Languages | Jan 27, 2025 | DiversityLanguage Identification | CodeCode Available | 0 |
| Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language Understanding | Jan 10, 2025 | Automatic Speech RecognitionClassification | CodeCode Available | 0 |
| Indonesian-English Code-Switching Speech Synthesizer Utilizing Multilingual STEN-TTS and Bert LID | Dec 26, 2024 | Language Identificationtext-to-speech | —Unverified | 0 |
| Enhancing Code-Switching ASR Leveraging Non-Peaky CTC Loss and Deep Language Posterior Injection | Nov 26, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Exploring Facets of Language Generation in the Limit | Nov 22, 2024 | Language IdentificationText Generation | —Unverified | 0 |
| Can adversarial attacks by large language models be attributed? | Nov 12, 2024 | AttributeLanguage Identification | —Unverified | 0 |
| Prompt Engineering Using GPT for Word-Level Code-Mixed Language Identification in Low-Resource Dravidian Languages | Nov 6, 2024 | Information RetrievalLanguage Identification | —Unverified | 0 |
| GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages | Oct 31, 2024 | Language Identification | CodeCode Available | 1 |
| Computational Approaches to Arabic-English Code-Switching | Oct 17, 2024 | Data AugmentationLanguage Identification | —Unverified | 0 |
| Generation through the lens of learning theory | Oct 17, 2024 | Language IdentificationLearning Theory | —Unverified | 0 |
| A Multi-Task Text Classification Pipeline with Natural Language Explanations: A User-Centric Evaluation in Sentiment Analysis and Offensive Language Identification in Greek Tweets | Oct 14, 2024 | Feature ImportanceLanguage Identification | —Unverified | 0 |
| From N-grams to Pre-trained Multilingual Models For Language Identification | Oct 11, 2024 | Language IdentificationXLM-R | CodeCode Available | 0 |
| Efficiently Identifying Low-Quality Language Subsets in Multilingual Datasets: A Case Study on a Large-Scale Multilingual Audio Dataset | Oct 5, 2024 | Language Identification | —Unverified | 0 |
| AfriHuBERT: A self-supervised speech representation model for African languages | Sep 30, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking | Sep 27, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| Leveraging Open-Source Large Language Models for Native Language Identification | Sep 15, 2024 | Feature EngineeringLanguage Acquisition | —Unverified | 0 |
| Enhancing Code-Switching Speech Recognition with LID-Based Collaborative Mixture of Experts Model | Sep 3, 2024 | Language IdentificationMixture-of-Experts | —Unverified | 0 |
| Literary and Colloquial Dialect Identification for Tamil using Acoustic Features | Aug 27, 2024 | Automatic Speech RecognitionDialect Identification | —Unverified | 0 |
| Language-Informed Beam Search Decoding for Multilingual Machine Translation | Aug 11, 2024 | Language IdentificationMachine Translation | CodeCode Available | 1 |
| Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond | Aug 7, 2024 | BenchmarkingLanguage Identification | CodeCode Available | 1 |
| Towards Generalized Offensive Language Identification | Jul 26, 2024 | Language Identification | —Unverified | 0 |
| A Recipe of Parallel Corpora Exploitation for Multilingual Large Language Models | Jun 29, 2024 | Language IdentificationMachine Translation | —Unverified | 0 |
| SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR | Jun 26, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Script-Agnostic Language Identification | Jun 25, 2024 | Language Identification | CodeCode Available | 0 |
| Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting | Jun 18, 2024 | DecoderLanguage Identification | —Unverified | 0 |
| An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios | Jun 13, 2024 | Language IdentificationSelf-Supervised Learning | CodeCode Available | 2 |
| Exploring Spoken Language Identification Strategies for Automatic Transcription of Multilingual Broadcast and Institutional Speech | Jun 13, 2024 | Language Identificationspeaker-diarization | —Unverified | 0 |
| Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation | Jun 12, 2024 | Language Identification | —Unverified | 0 |
| ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets | Jun 12, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| MaskLID: Code-Switching Language Identification through Iterative Masking | Jun 10, 2024 | Language IdentificationSentence | CodeCode Available | 1 |
| Malayalam Sign Language Identification using Finetuned YOLOv8 and Computer Vision Techniques | May 8, 2024 | Language Identification | —Unverified | 0 |
| Whispy: Adapting STT Whisper Models to Real-Time Environments | May 6, 2024 | Action DetectionActivity Detection | —Unverified | 0 |
| A Federated Learning Approach to Privacy Preserving Offensive Language Identification | Apr 17, 2024 | Federated LearningLanguage Identification | —Unverified | 0 |