SOTAVerified

Language Identification

Language identification is the task of determining the language of a text.

Papers

Showing 125 of 794 papers

TitleStatusHype
Neighbors and relatives: How do speech embeddings reflect linguistic connections across the world?0
mSTEB: Massively Multilingual Evaluation of LLMs on Speech and Text Tasks0
Recursive Semantic Anchoring in ISO 639:2023: A Structural Extension to ISO/TC 37 Frameworks0
TalTech Systems for the Interspeech 2025 ML-SUPERB 2.0 Challenge0
Improving Multilingual Speech Models on ML-SUPERB 2.0: Fine-tuning with Data Augmentation and LID-Aware CTC0
CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-trainingCode11
Token Masking Improves Transformer-Based Text Classification0
Advancing Uto-Aztecan Language Technologies: A Case Study on the Endangered Comanche LanguageCode0
Improving Informally Romanized Language Identification0
(Im)possibility of Automated Hallucination Detection in Large Language Models0
COMI-LINGUA: Expert Annotated Large-Scale Dataset for Multitask NLP in Hindi-English Code-Mixing0
KréyoLID From Language Identification Towards Language MiningCode0
NusaAksara: A Multimodal and Multilingual Benchmark for Preserving Indonesian Indigenous Scripts0
English Please: Evaluating Machine Translation with Large Language Models for Multilingual Bug ReportsCode0
Multi-label Scandinavian Language Identification (SLIDE)Code0
On the use of Performer and Agent Attention for Spoken Language Identification0
Evaluating Standard and Dialectal Frisian ASR: Multilingual Fine-tuning and Language Identification for Improved Low-resource Performance0
Is It Navajo? Accurate Language Detection in Endangered Athabaskan LanguagesCode0
Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language UnderstandingCode0
Indonesian-English Code-Switching Speech Synthesizer Utilizing Multilingual STEN-TTS and Bert LID0
Enhancing Code-Switching ASR Leveraging Non-Peaky CTC Loss and Deep Language Posterior Injection0
Exploring Facets of Language Generation in the Limit0
Can adversarial attacks by large language models be attributed?0
Prompt Engineering Using GPT for Word-Level Code-Mixed Language Identification in Low-Resource Dravidian Languages0
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority LanguagesCode1
Show:102550
← PrevPage 1 of 32Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1wav2vec 2.0 LV-60KError rate7.2Unverified
2XLS-RError rate5.7Unverified
#ModelMetricClaimedVerifiedStatus
1GlotLIDMacro F10.98Unverified
#ModelMetricClaimedVerifiedStatus
1FastTextAccuracy0.97Unverified
#ModelMetricClaimedVerifiedStatus
1Apple bi-LSTMAccuracy91.37Unverified
#ModelMetricClaimedVerifiedStatus
1Apple bi-LSTMAccuracy86.93Unverified
#ModelMetricClaimedVerifiedStatus
1ConformerG-PAccuracy99.8Unverified