SOTAVerified

Language Identification

Language identification is the task of determining the language of a text.

Papers

Showing 150 of 794 papers

TitleStatusHype
mSTEB: Massively Multilingual Evaluation of LLMs on Speech and Text Tasks0
Neighbors and relatives: How do speech embeddings reflect linguistic connections across the world?0
Recursive Semantic Anchoring in ISO 639:2023: A Structural Extension to ISO/TC 37 Frameworks0
TalTech Systems for the Interspeech 2025 ML-SUPERB 2.0 Challenge0
Improving Multilingual Speech Models on ML-SUPERB 2.0: Fine-tuning with Data Augmentation and LID-Aware CTC0
CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-trainingCode11
Token Masking Improves Transformer-Based Text Classification0
Advancing Uto-Aztecan Language Technologies: A Case Study on the Endangered Comanche LanguageCode0
Improving Informally Romanized Language Identification0
(Im)possibility of Automated Hallucination Detection in Large Language Models0
COMI-LINGUA: Expert Annotated Large-Scale Dataset for Multitask NLP in Hindi-English Code-Mixing0
KréyoLID From Language Identification Towards Language MiningCode0
NusaAksara: A Multimodal and Multilingual Benchmark for Preserving Indonesian Indigenous Scripts0
English Please: Evaluating Machine Translation with Large Language Models for Multilingual Bug ReportsCode0
Multi-label Scandinavian Language Identification (SLIDE)Code0
On the use of Performer and Agent Attention for Spoken Language Identification0
Evaluating Standard and Dialectal Frisian ASR: Multilingual Fine-tuning and Language Identification for Improved Low-resource Performance0
Is It Navajo? Accurate Language Detection in Endangered Athabaskan LanguagesCode0
Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language UnderstandingCode0
Indonesian-English Code-Switching Speech Synthesizer Utilizing Multilingual STEN-TTS and Bert LID0
Enhancing Code-Switching ASR Leveraging Non-Peaky CTC Loss and Deep Language Posterior Injection0
Exploring Facets of Language Generation in the Limit0
Can adversarial attacks by large language models be attributed?0
Prompt Engineering Using GPT for Word-Level Code-Mixed Language Identification in Low-Resource Dravidian Languages0
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority LanguagesCode1
Computational Approaches to Arabic-English Code-Switching0
Generation through the lens of learning theory0
A Multi-Task Text Classification Pipeline with Natural Language Explanations: A User-Centric Evaluation in Sentiment Analysis and Offensive Language Identification in Greek Tweets0
From N-grams to Pre-trained Multilingual Models For Language IdentificationCode0
Efficiently Identifying Low-Quality Language Subsets in Multilingual Datasets: A Case Study on a Large-Scale Multilingual Audio Dataset0
AfriHuBERT: A self-supervised speech representation model for African languagesCode0
Improving Multilingual ASR in the Wild Using Simple N-best Re-rankingCode0
Leveraging Open-Source Large Language Models for Native Language Identification0
Enhancing Code-Switching Speech Recognition with LID-Based Collaborative Mixture of Experts Model0
Literary and Colloquial Dialect Identification for Tamil using Acoustic Features0
Language-Informed Beam Search Decoding for Multilingual Machine TranslationCode1
Speech-MASSIVE: A Multilingual Speech Dataset for SLU and BeyondCode1
Towards Generalized Offensive Language Identification0
A Recipe of Parallel Corpora Exploitation for Multilingual Large Language Models0
SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR0
Script-Agnostic Language IdentificationCode0
Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting0
An Initial Investigation of Language Adaptation for TTS Systems under Low-resource ScenariosCode2
Exploring Spoken Language Identification Strategies for Automatic Transcription of Multilingual Broadcast and Institutional Speech0
Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation0
ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets0
MaskLID: Code-Switching Language Identification through Iterative MaskingCode1
Malayalam Sign Language Identification using Finetuned YOLOv8 and Computer Vision Techniques0
Whispy: Adapting STT Whisper Models to Real-Time Environments0
A Federated Learning Approach to Privacy Preserving Offensive Language Identification0
Show:102550
← PrevPage 1 of 16Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1wav2vec 2.0 LV-60KError rate7.2Unverified
2XLS-RError rate5.7Unverified
#ModelMetricClaimedVerifiedStatus
1GlotLIDMacro F10.98Unverified
#ModelMetricClaimedVerifiedStatus
1FastTextAccuracy0.97Unverified
#ModelMetricClaimedVerifiedStatus
1Apple bi-LSTMAccuracy91.37Unverified
#ModelMetricClaimedVerifiedStatus
1Apple bi-LSTMAccuracy86.93Unverified
#ModelMetricClaimedVerifiedStatus
1ConformerG-PAccuracy99.8Unverified