SOTAVerified

Language Identification

Language identification is the task of determining the language of a text.

Papers

Showing 51100 of 794 papers

TitleStatusHype
FastSpell: the LangId Magic SpellCode1
What is Learnt by the LEArnable Front-end (LEAF)? Adapting Per-Channel Energy Normalisation (PCEN) to Noisy ConditionsCode0
Geographically-Informed Language IdentificationCode0
More than words: Advancements and challenges in speech recognition for singing0
Validating and Exploring Large Geographic Corpora0
Aligning Speech to Languages to Enhance Code-switching Speech Recognition0
Language and Speech Technology for Central Kurdish VarietiesCode1
KInIT at SemEval-2024 Task 8: Fine-tuned LLMs for Multilingual Machine-Generated Text DetectionCode1
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language IdentificationCode0
Code-Switched Language Identification is Harder Than You ThinkCode0
Detecting Structured Language Alternations in Historical Documents by Combining Language Identification with Fourier Analysis0
Acoustic characterization of speech rhythm: going beyond metrics with recurrent neural networks0
AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data FiltersCode1
Language Detection for Transliterated Content0
MathPile: A Billion-Token-Scale Pretraining Corpus for MathCode2
Generative linguistic representation for spoken language identification0
Cross-Linguistic Offensive Language Detection: BERT-Based Analysis of Bengali, Assamese, & Bodo Conversational Hateful Content from Social Media0
Leveraging Language ID to Calculate Intermediate CTC Loss for Enhanced Code-Switching Speech Recognition0
Attention-Guided Adaptation for Code-Switching Speech Recognition0
Native Language Identification with Large Language Models0
Self-supervised Adaptive Pre-training of Multilingual Speech Models for Language and Dialect Identification0
A Text-to-Text Model for Multilingual Offensive Language Identification0
Offensive Language Identification in Transliterated and Code-Mixed Bangla0
The Obscure Limitation of Modular Multilingual Language Models0
Fumbling in Babel: An Investigation into ChatGPT's Language Identification Ability0
OffMix-3L: A Novel Code-Mixed Dataset in Bangla-English-Hindi for Offensive Language IdentificationCode0
GlotLID: Language Identification for Low-Resource LanguagesCode1
Advanced accent/dialect identification and accentedness assessment with multi-embedding models and automatic speech recognition0
Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond0
Wavelet Scattering Transform for Improving Generalization in Low-Resourced Spoken Language Identification0
Multimodal Modeling For Spoken Language Identification0
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages0
Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from WhisperCode1
Native Language Identification with Big Bird EmbeddingsCode0
Robust Open-Set Spoken Language Identification and the CU MultiLang Dataset0
Fine-Tuning Llama 2 Large Language Models for Detecting Online Sexual Predatory Chats and Abusive Texts0
Bilingual Streaming ASR with Grapheme units and Auxiliary Monolingual Loss0
Turkish Native Language Identification0
MASR: Multi-label Aware Speech Representation0
Multilingual Speech-to-Speech Translation into Multiple Target Languages0
Towards spoken dialect identification of Irish0
Confidence-based Ensembles of End-to-End Speech Recognition Models0
My Boli: Code-mixed Marathi-English Corpora, Pretrained Language Models and Evaluation BenchmarksCode0
Unified model for code-switching speech recognition and language identification based on a concatenated tokenizerCode0
RoBERTweet: A BERT Language Model for Romanian Tweets0
Leveraging Language Identification to Enhance Code-Mixed Text Classification0
Label Aware Speech Representation Learning For Language Identification0
Spoken Language Identification System for English-Mandarin Code-Switching Child-Directed SpeechCode0
Simple yet Effective Code-Switching Language Identification with Multitask Pre-Training and Transfer Learning0
MERLIon CCS Challenge Evaluation PlanCode0
Show:102550
← PrevPage 2 of 16Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1wav2vec 2.0 LV-60KError rate7.2Unverified
2XLS-RError rate5.7Unverified
#ModelMetricClaimedVerifiedStatus
1GlotLIDMacro F10.98Unverified
#ModelMetricClaimedVerifiedStatus
1FastTextAccuracy0.97Unverified
#ModelMetricClaimedVerifiedStatus
1Apple bi-LSTMAccuracy91.37Unverified
#ModelMetricClaimedVerifiedStatus
1Apple bi-LSTMAccuracy86.93Unverified
#ModelMetricClaimedVerifiedStatus
1ConformerG-PAccuracy99.8Unverified