SOTAVerified

Language Identification

Language identification is the task of determining the language of a text.

Papers

Showing 101150 of 794 papers

TitleStatusHype
Investigating model performance in language identification: beyond simple error statisticsCode0
MERLIon CCS Challenge: A English-Mandarin code-switching child-directed speech corpus for language identification and diarizationCode0
Script Normalization for Unconventional Writing of Under-Resourced Languages in Bilingual CommunitiesCode0
Bhasha-Abhijnaanam: Native-script and romanized Language Identification for 22 Indic languagesCode1
An Open Dataset and Model for Language IdentificationCode1
LIMIT: Language Identification, Misidentification, and Translation using Hierarchical Models in 350+ LanguagesCode0
Multilingual Large Language Models Are Not (Yet) Code-Switchers0
Scaling Speech Technology to 1,000+ LanguagesCode1
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark0
DocLangID: Improving Few-Shot Training to Identify the Language of Historical DocumentsCode0
Lessons Learned in ATCO2: 5000 hours of Air Traffic Control Communications for Robust Automatic Speech Recognition and Understanding0
Approaches to Corpus Creation for Low-Resource Language Technology: the Case of Southern Kurdish and LakiCode0
PALI: A Language Identification Benchmark for Perso-Arabic ScriptsCode1
MMT: A Multilingual and Multi-Topic Indian Social Media Dataset0
Joint unsupervised and supervised learning for context-aware language identification0
Language Variety Identification with True LabelsCode0
Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training0
Augmented Transformers with Adaptive n-grams Embedding for Multilingual Scene Text Recognition0
Language identification as improvement for lip-based biometric visual systems0
Improving Spoken Language Identification with Map-MixCode1
Cross-Corpora Spoken Language Identification with Domain Diversification and Generalization0
A Twitter BERT Approach for Offensive Language Detection in Marathi0
SOLD: Sinhala Offensive Language DatasetCode1
An Overview of Indian Spoken Language Recognition from Machine Learning Perspective0
Transformer-based Model for Word Level Language Identification in Code-mixed Kannada-English Texts0
Predicting the Type and Target of Offensive Social Media Posts in MarathiCode0
Overview of the HASOC Subtrack at FIRE 2022: Offensive Language Identification in Marathi0
Scaling Native Language Identification with Transformer Adapters0
CoLI-Machine Learning Approaches for Code-mixed Language Identification at the Word Level in Kannada-English Texts0
Accidental Learners: Spoken Language Identification in Multilingual Self-Supervised Models0
LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers0
A Compact End-to-End Model with Local and Global Context for Spoken Language Identification0
AfroLID: A Neural Language Identification Tool for African LanguagesCode1
Italian Language and Dialect Identification and Regional French Variety Detection using Adaptive Naive BayesCode0
Neural Networks for Cross-domain Language Identification. Phlyers @Vardial 20220
The Curious Case of Logistic Regression for Italian Languages and Dialects IdentificationCode0
OcWikiDisc: a Corpus of Wikipedia Talk Pages in Occitan0
The first neural machine translation system for the Erzya languageCode1
Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification0
Evaluation of Off-the-Shelf Language Identification Tools on Bulgarian Social Media Posts0
IndicSUPERB: A Speech Processing Universal Performance Benchmark for Indian languagesCode1
Unravelling Interlanguage Facts via Explainable Machine Learning0
Extending RNN-T-based speech recognition systems with emotion and language classification0
Distilled Non-Semantic Speech Embeddings with Binary Neural Networks for Low-Resource DevicesCode0
Huqariq: A Multilingual Speech Corpus of Native Languages of Peru for Speech Recognition0
TechSSN at SemEval-2022 Task 6: Intended Sarcasm Detection using Transformer Models0
TweetNLP: Cutting-Edge Natural Language Processing for Social MediaCode2
Language Identification for Austronesian LanguagesCode0
Word-level Language Identification Using Subword Embeddings for Code-mixed Bangla-English Social Media DataCode1
CoSwID, a Code Switching Identification Method Suitable for Under-Resourced Languages0
Show:102550
← PrevPage 3 of 16Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1wav2vec 2.0 LV-60KError rate7.2Unverified
2XLS-RError rate5.7Unverified
#ModelMetricClaimedVerifiedStatus
1GlotLIDMacro F10.98Unverified
#ModelMetricClaimedVerifiedStatus
1FastTextAccuracy0.97Unverified
#ModelMetricClaimedVerifiedStatus
1Apple bi-LSTMAccuracy91.37Unverified
#ModelMetricClaimedVerifiedStatus
1Apple bi-LSTMAccuracy86.93Unverified
#ModelMetricClaimedVerifiedStatus
1ConformerG-PAccuracy99.8Unverified