SOTAVerified

Language Identification

Language identification is the task of determining the language of a text.

Papers

Showing 150 of 794 papers

TitleStatusHype
CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-trainingCode11
TweetNLP: Cutting-Edge Natural Language Processing for Social MediaCode2
MathPile: A Billion-Token-Scale Pretraining Corpus for MathCode2
An Initial Investigation of Language Adaptation for TTS Systems under Low-resource ScenariosCode2
Word-level Language Identification Using Subword Embeddings for Code-mixed Bangla-English Social Media DataCode1
Language-Informed Beam Search Decoding for Multilingual Machine TranslationCode1
Language and Speech Technology for Central Kurdish VarietiesCode1
Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text CorpusCode1
PALI: A Language Identification Benchmark for Perso-Arabic ScriptsCode1
Triplet Entropy Loss: Improving The Generalisation of Short Speech Language Identification SystemsCode1
An Open Dataset and Model for Language IdentificationCode1
Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from WhisperCode1
SpeechBrain: A General-Purpose Speech ToolkitCode1
MaskLID: Code-Switching Language Identification through Iterative MaskingCode1
GlotLID: Language Identification for Low-Resource LanguagesCode1
KInIT at SemEval-2024 Task 8: Fine-tuned LLMs for Multilingual Machine-Generated Text DetectionCode1
DravidianCodeMix: Sentiment Analysis and Offensive Language Identification Dataset for Dravidian Languages in Code-Mixed TextCode1
IndicSUPERB: A Speech Processing Universal Performance Benchmark for Indian languagesCode1
L3Cube-HingCorpus and HingBERT: A Code Mixed Hindi-English Dataset and BERT Language ModelsCode1
NLPDove at SemEval-2020 Task 12: Improving Offensive Language Detection with Cross-lingual TransferCode1
SOLD: Sinhala Offensive Language DatasetCode1
The first neural machine translation system for the Erzya languageCode1
AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data FiltersCode1
VoxLingua107: a Dataset for Spoken Language RecognitionCode1
AfroLID: A Neural Language Identification Tool for African LanguagesCode1
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at ScaleCode1
BERT-LID: Leveraging BERT to Improve Spoken Language IdentificationCode1
Speech-MASSIVE: A Multilingual Speech Dataset for SLU and BeyondCode1
Scaling Speech Technology to 1,000+ LanguagesCode1
PHO-LID: A Unified Model Incorporating Acoustic-Phonetic and Phonotactic Information for Language IdentificationCode1
A reproduction of Apple's bi-directional LSTM models for language identification in short stringsCode1
Bhasha-Abhijnaanam: Native-script and romanized Language Identification for 22 Indic languagesCode1
Common Voice: A Massively-Multilingual Speech CorpusCode1
FastSpell: the LangId Magic SpellCode1
KUISAIL at SemEval-2020 Task 12: BERT-CNN for Offensive Speech Identification in Social MediaCode1
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority LanguagesCode1
Hyperseed: Unsupervised Learning with Vector Symbolic ArchitecturesCode1
Improving Spoken Language Identification with Map-MixCode1
Using Radio Archives for Low-Resource Speech Recognition: Towards an Intelligent Virtual Assistant for Illiterate UsersCode1
AlexU-BackTranslation-TL at SemEval-2020 Task 12: Improving Offensive Language Detection Using Data Augmentation and Transfer Learning0
Albanian Language Identification in Text Documents0
A deep-learning based native-language classification by using a latent semantic analysis for the NLI Shared Task 20170
SOLID: A Large-Scale Semi-Supervised Dataset for Offensive Language Identification0
A language model based approach towards large scale and lightweight language identification systems0
A Deep Generative Approach to Native Language Identification0
A Code-Switching Corpus of Turkish-German Conversations0
Addition of Code Mixed Features to Enhance the Sentiment Prediction of Song Lyrics0
Accurate Pinyin-English Codeswitched Language Identification0
A Federated Learning Approach to Privacy Preserving Offensive Language Identification0
A Dataset and Classifier for Recognizing Social Media English0
Show:102550
← PrevPage 1 of 16Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1wav2vec 2.0 LV-60KError rate7.2Unverified
2XLS-RError rate5.7Unverified
#ModelMetricClaimedVerifiedStatus
1GlotLIDMacro F10.98Unverified
#ModelMetricClaimedVerifiedStatus
1FastTextAccuracy0.97Unverified
#ModelMetricClaimedVerifiedStatus
1Apple bi-LSTMAccuracy91.37Unverified
#ModelMetricClaimedVerifiedStatus
1Apple bi-LSTMAccuracy86.93Unverified
#ModelMetricClaimedVerifiedStatus
1ConformerG-PAccuracy99.8Unverified