Language Identification

Language identification is the task of determining the language of a text.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 201–250 of 794 papers

Title	Date	Tasks	Status	Hype
Hyperseed: Unsupervised Learning with Vector Symbolic Architectures	Oct 15, 2021	Few-Shot LearningLanguage Identification	CodeCode Available	1
Mandarin-English Code-switching Speech Recognition with Self-supervised Speech Representation Models	Oct 7, 2021	Language IdentificationSelf-Supervised Learning	—Unverified	0
Pretrained Transformers for Offensive Language Identification in Tanglish	Oct 6, 2021	Language IdentificationText Classification	CodeCode Available	0
Is Attention always needed? A Case Study on Language Identification from Speech	Oct 5, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition	Sep 27, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Language Identification with a Reciprocal Rank Classifier	Sep 20, 2021	Domain AdaptationLanguage Identification	CodeCode Available	0
UPV at CheckThat! 2021: Mitigating Cultural Differences for Identifying Multilingual Check-worthy Claims	Sep 19, 2021	Fact CheckingLanguage Identification	CodeCode Available	0
Unsupervised Personality-Aware Language Identification	Sep 17, 2021	Language Identification	—Unverified	0
The futility of STILTs for the classification of lexical borrowings in Spanish	Sep 17, 2021	Language Identificationnamed-entity-recognition	—Unverified	0
On the Language-specificity of Multilingual BERT and the Impact of Fine-tuning	Sep 14, 2021	Language IdentificationNatural Language Inference	CodeCode Available	0
FBERT: A Neural Transformer for Identifying Offensive Content	Sep 10, 2021	Language IdentificationXLM-R	—Unverified	0
Cross-lingual Offensive Language Identification for Low Resource Languages: The Case of Marathi	Sep 8, 2021	Language IdentificationTransfer Learning	CodeCode Available	0
A Pre-trained Transformer and CNN Model with Joint Language ID and Part-of-Speech Tagging for Code-Mixed Social-Media Text	Sep 1, 2021	Language IdentificationPart-Of-Speech Tagging	—Unverified	0
Fiction in Russian Translation: A Translationese Study	Sep 1, 2021	Binary ClassificationLanguage Identification	—Unverified	0
Corpus Creation and Language Identification in Low-Resource Code-Mixed Telugu-English Text	Sep 1, 2021	ClassificationLanguage Identification	—Unverified	0
Offensive Language Identification in Low-resourced Code-mixed Dravidian languages using Pseudo-labeling	Aug 27, 2021	Language IdentificationMarketing	CodeCode Available	0
Towards Offensive Language Identification for Tamil Code-Mixed YouTube Comments and Posts	Aug 24, 2021	Language IdentificationTransfer Learning	CodeCode Available	0
A Dual-Decoder Conformer for Multilingual Speech Recognition	Aug 22, 2021	DecoderLanguage Identification	—Unverified	0
Dyn-ASR: Compact, Multilingual Speech Recognition via Spoken Language and Accent Identification	Aug 4, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
OLR 2021 Challenge: Datasets, Rules and Baselines	Jul 23, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Improved Language Identification Through Cross-Lingual Self-Supervised Learning	Jul 8, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Oriental Language Recognition (OLR) 2020: Summary and Analysis	Jul 5, 2021	Dialect IdentificationLanguage Identification	—Unverified	0
Language Identification of Hindi-English tweets using code-mixed BERT	Jul 2, 2021	Language IdentificationTransfer Learning	—Unverified	0
Language Lexicons for Hindi-English Multilingual Text Processing	Jun 29, 2021	Language Identification	—Unverified	0
A Simple and Efficient Probabilistic Language model for Code-Mixed Text	Jun 29, 2021	Information RetrievalLanguage Identification	—Unverified	0
BERT-based Multi-Task Model for Country and Province Level Modern Standard Arabic and Dialectal Arabic Identification	Jun 23, 2021	Language IdentificationMulti-Task Learning	—Unverified	0
DravidianCodeMix: Sentiment Analysis and Offensive Language Identification Dataset for Dravidian Languages in Code-Mixed Text	Jun 17, 2021	Language IdentificationSentiment Analysis	CodeCode Available	1
SpeechBrain: A General-Purpose Speech Toolkit	Jun 8, 2021	Language IdentificationSpoken Language Understanding	CodeCode Available	1
SIGTYP 2021 Shared Task: Robust Spoken Language Identification	Jun 7, 2021	Domain AdaptationLanguage Identification	—Unverified	0
Active learning and negative evidence for language identification	Jun 1, 2021	Active LearningLanguage Identification	—Unverified	0
Self-Contextualized Attention for Abusive Language Identification	Jun 1, 2021	Abusive LanguageLanguage Identification	—Unverified	0
Transliteration for Low-Resource Code-Switching Texts: Building an Automatic Cyrillic-to-Latin Converter for Tatar	Jun 1, 2021	Language IdentificationTransliteration	—Unverified	0
Much Gracias: Semi-supervised Code-switch Detection for Spanish-English: How far can we get?	Jun 1, 2021	Language Identification	—Unverified	0
Anlirika: An LSTM–CNN Flow Twister for Spoken Language Identification	Jun 1, 2021	Language IdentificationSpoken language identification	—Unverified	0
Language ID Prediction from Speech Using Self-Attentive Pooling	Jun 1, 2021	Language Identificationspeech-recognition	—Unverified	0
Data Filtering using Cross-Lingual Word Embeddings	Jun 1, 2021	Cross-Lingual Word EmbeddingsLanguage Identification	—Unverified	0
Singing Language Identification using a Deep Phonotactic Approach	May 31, 2021	ClassificationLanguage Identification	CodeCode Available	0
Low-Resource Spoken Language Identification Using Self-Attentive Pooling and Deep 1D Time-Channel Separable Convolutions	May 31, 2021	Language Identificationspeech-recognition	—Unverified	0
An Exploratory Analysis of the Relation Between Offensive Language and Mental Health	May 31, 2021	Depression DetectionLanguage Identification	—Unverified	0
Multilingual Offensive Language Identification for Low-resource Languages	May 12, 2021	Language IdentificationTransfer Learning	—Unverified	0
Cross-Corpora Language Recognition: A Preliminary Investigation with Indian Languages	May 10, 2021	Language IdentificationSpoken language identification	—Unverified	0
Using Radio Archives for Low-Resource Speech Recognition: Towards an Intelligent Virtual Assistant for Illiterate Users	Apr 27, 2021	Language IdentificationRepresentation Learning	CodeCode Available	1
Language ID Prediction from Speech Using Self-Attentive Pooling and 1D-Convolutions	Apr 24, 2021	Language Identificationspeech-recognition	—Unverified	0
IIITK@DravidianLangTech-EACL2021: Offensive Language Identification and Meme Classification in Tamil, Malayalam and Kannada	Apr 17, 2021	ClassificationLanguage Identification	CodeCode Available	0
BERT-based Multi-Task Model for Country and Province Level MSA and Dialectal Arabic Identification	Apr 1, 2021	Language IdentificationMulti-Task Learning	—Unverified	0
Optimizing a Supervised Classifier for a Difficult Language Identification Problem	Apr 1, 2021	Language Identificationregression	—Unverified	0
Findings of the VarDial Evaluation Campaign 2021	Apr 1, 2021	Dialect IdentificationLanguage Identification	—Unverified	0
N-gram and Neural Models for Uralic Language Identification: NRC at VarDial 2021	Apr 1, 2021	Language Identification	—Unverified	0
Comparing the Performance of CNNs and Shallow Models for Language Identification	Apr 1, 2021	Dialect IdentificationLanguage Identification	CodeCode Available	0
Simon @ DravidianLangTech-EACL2021: Detecting Offensive Content in Kannada Language	Apr 1, 2021	Language Identification	—Unverified	0

Show:10 25 50

← PrevPage 5 of 16Next →

All datasets VOXLINGUA107 GlotLID-C Nordic Language Identification OpenSubtitles Universal Dependencies VoxForge

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	wav2vec 2.0 LV-60K	Error rate	7.2	—	Unverified
2	XLS-R	Error rate	5.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GlotLID	Macro F1	0.98	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	FastText	Accuracy	0.97	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Apple bi-LSTM	Accuracy	91.37	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Apple bi-LSTM	Accuracy	86.93	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ConformerG-P	Accuracy	99.8	—	Unverified