SOTAVerified

Language Identification

Language identification is the task of determining the language of a text.

Papers

Showing 2650 of 794 papers

TitleStatusHype
VoxLingua107: a Dataset for Spoken Language RecognitionCode1
DravidianCodeMix: Sentiment Analysis and Offensive Language Identification Dataset for Dravidian Languages in Code-Mixed TextCode1
The first neural machine translation system for the Erzya languageCode1
Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text CorpusCode1
Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from WhisperCode1
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority LanguagesCode1
An Open Dataset and Model for Language IdentificationCode1
A reproduction of Apple's bi-directional LSTM models for language identification in short stringsCode1
Hyperseed: Unsupervised Learning with Vector Symbolic ArchitecturesCode1
BERT-LID: Leveraging BERT to Improve Spoken Language IdentificationCode1
Common Voice: A Massively-Multilingual Speech CorpusCode1
Bhasha-Abhijnaanam: Native-script and romanized Language Identification for 22 Indic languagesCode1
KUISAIL at SemEval-2020 Task 12: BERT-CNN for Offensive Speech Identification in Social MediaCode1
Language-Informed Beam Search Decoding for Multilingual Machine TranslationCode1
GeezSwitch: Language Identification in Typologically Related Low-resourced East African LanguagesCode0
Geographic Adaptation of Pretrained Language ModelsCode0
From English to Code-Switching: Transfer Learning with Strong Morphological CluesCode0
Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language UnderstandingCode0
From N-grams to Pre-trained Multilingual Models For Language IdentificationCode0
Geographically-Informed Language IdentificationCode0
Aggressive Language Identification Using Word Embeddings and Sentiment FeaturesCode0
Finding Structure in Text, Genome and Other Symbolic SequencesCode0
An Investigation into the Contribution of Locally Aggregated Descriptors to Figurative Language IdentificationCode0
AfriHuBERT: A self-supervised speech representation model for African languagesCode0
FBK-DH at SemEval-2020 Task 12: Using Multi-channel BERT for Multilingual Offensive Language DetectionCode0
Show:102550
← PrevPage 2 of 32Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1wav2vec 2.0 LV-60KError rate7.2Unverified
2XLS-RError rate5.7Unverified
#ModelMetricClaimedVerifiedStatus
1GlotLIDMacro F10.98Unverified
#ModelMetricClaimedVerifiedStatus
1FastTextAccuracy0.97Unverified
#ModelMetricClaimedVerifiedStatus
1Apple bi-LSTMAccuracy91.37Unverified
#ModelMetricClaimedVerifiedStatus
1Apple bi-LSTMAccuracy86.93Unverified
#ModelMetricClaimedVerifiedStatus
1ConformerG-PAccuracy99.8Unverified