SOTAVerified

Language Identification

Language identification is the task of determining the language of a text.

Papers

Showing 101150 of 794 papers

TitleStatusHype
Offensive Language Identification in Transliterated and Code-Mixed Bangla0
The Obscure Limitation of Modular Multilingual Language Models0
Fumbling in Babel: An Investigation into ChatGPT's Language Identification Ability0
OffMix-3L: A Novel Code-Mixed Dataset in Bangla-English-Hindi for Offensive Language IdentificationCode0
Advanced accent/dialect identification and accentedness assessment with multi-embedding models and automatic speech recognition0
Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond0
Wavelet Scattering Transform for Improving Generalization in Low-Resourced Spoken Language Identification0
Multimodal Modeling For Spoken Language Identification0
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages0
Native Language Identification with Big Bird EmbeddingsCode0
Robust Open-Set Spoken Language Identification and the CU MultiLang Dataset0
Fine-Tuning Llama 2 Large Language Models for Detecting Online Sexual Predatory Chats and Abusive Texts0
Bilingual Streaming ASR with Grapheme units and Auxiliary Monolingual Loss0
Turkish Native Language Identification0
MASR: Multi-label Aware Speech Representation0
Multilingual Speech-to-Speech Translation into Multiple Target Languages0
Towards spoken dialect identification of Irish0
Confidence-based Ensembles of End-to-End Speech Recognition Models0
My Boli: Code-mixed Marathi-English Corpora, Pretrained Language Models and Evaluation BenchmarksCode0
Unified model for code-switching speech recognition and language identification based on a concatenated tokenizerCode0
RoBERTweet: A BERT Language Model for Romanian Tweets0
Leveraging Language Identification to Enhance Code-Mixed Text Classification0
Label Aware Speech Representation Learning For Language Identification0
Spoken Language Identification System for English-Mandarin Code-Switching Child-Directed SpeechCode0
Simple yet Effective Code-Switching Language Identification with Multitask Pre-Training and Transfer Learning0
MERLIon CCS Challenge Evaluation PlanCode0
Investigating model performance in language identification: beyond simple error statisticsCode0
MERLIon CCS Challenge: A English-Mandarin code-switching child-directed speech corpus for language identification and diarizationCode0
Script Normalization for Unconventional Writing of Under-Resourced Languages in Bilingual CommunitiesCode0
LIMIT: Language Identification, Misidentification, and Translation using Hierarchical Models in 350+ LanguagesCode0
Multilingual Large Language Models Are Not (Yet) Code-Switchers0
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark0
DocLangID: Improving Few-Shot Training to Identify the Language of Historical DocumentsCode0
Lessons Learned in ATCO2: 5000 hours of Air Traffic Control Communications for Robust Automatic Speech Recognition and Understanding0
Approaches to Corpus Creation for Low-Resource Language Technology: the Case of Southern Kurdish and LakiCode0
MMT: A Multilingual and Multi-Topic Indian Social Media Dataset0
Joint unsupervised and supervised learning for context-aware language identification0
Language Variety Identification with True LabelsCode0
Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training0
Augmented Transformers with Adaptive n-grams Embedding for Multilingual Scene Text Recognition0
Language identification as improvement for lip-based biometric visual systems0
Cross-Corpora Spoken Language Identification with Domain Diversification and Generalization0
A Twitter BERT Approach for Offensive Language Detection in Marathi0
An Overview of Indian Spoken Language Recognition from Machine Learning Perspective0
Transformer-based Model for Word Level Language Identification in Code-mixed Kannada-English Texts0
Predicting the Type and Target of Offensive Social Media Posts in MarathiCode0
Scaling Native Language Identification with Transformer Adapters0
Overview of the HASOC Subtrack at FIRE 2022: Offensive Language Identification in Marathi0
CoLI-Machine Learning Approaches for Code-mixed Language Identification at the Word Level in Kannada-English Texts0
Accidental Learners: Spoken Language Identification in Multilingual Self-Supervised Models0
Show:102550
← PrevPage 3 of 16Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1wav2vec 2.0 LV-60KError rate7.2Unverified
2XLS-RError rate5.7Unverified
#ModelMetricClaimedVerifiedStatus
1GlotLIDMacro F10.98Unverified
#ModelMetricClaimedVerifiedStatus
1FastTextAccuracy0.97Unverified
#ModelMetricClaimedVerifiedStatus
1Apple bi-LSTMAccuracy91.37Unverified
#ModelMetricClaimedVerifiedStatus
1Apple bi-LSTMAccuracy86.93Unverified
#ModelMetricClaimedVerifiedStatus
1ConformerG-PAccuracy99.8Unverified