Language Identification

Language identification is the task of determining the language of a text.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–200 of 794 papers

Title	Date	Tasks	Status
AlexU-BackTranslation-TL at SemEval-2020 Task 12: Improving Offensive Language Detection Using Data Augmentation and Transfer Learning	Dec 1, 2020	Data AugmentationLanguage Identification	—Unverified
BRUMS at SemEval-2020 Task 12 : Transformer based Multilingual Offensive Language Identification in Social Media	Oct 13, 2020	Language Identification	—Unverified
Bootstrapping a historical commodities lexicon with SKOS and DBpedia	Apr 1, 2014	ChunkingLanguage Identification	—Unverified
Arabic Dialect Identification in the Context of Bivalency and Code-Switching	May 1, 2018	Dialect IdentificationLanguage Identification	—Unverified
BNU-HKBU UIC NLP Team 2 at SemEval-2019 Task 6: Detecting Offensive Language Using BERT model	Jun 1, 2019	Language IdentificationSentence	—Unverified
Bilingual Streaming ASR with Grapheme units and Auxiliary Monolingual Loss	Aug 11, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
A Pre-trained Transformer and CNN Model with Joint Language ID and Part-of-Speech Tagging for Code-Mixed Social-Media Text	Sep 1, 2021	Language IdentificationPart-Of-Speech Tagging	—Unverified
Albanian Language Identification in Text Documents	Jan 14, 2019	ArticlesGeneral Classification	—Unverified
A deep-learning based native-language classification by using a latent semantic analysis for the NLI Shared Task 2017	Sep 1, 2017	Automatic Speech Recognition (ASR)Dimensionality Reduction	—Unverified
BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition	Sep 27, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
BhamNLP at SemEval-2020 Task 12: An Ensemble of Different Word Embeddings and Emotion Transfer Learning for Arabic Offensive Language Identification in Social Media	Dec 1, 2020	Language IdentificationTransfer Learning	—Unverified
BFCAI at ComMA@ICON 2021: Support Vector Machines for Multilingual Gender Biased and Communal Language Identification	Dec 1, 2021	Language Identification	—Unverified
A Portuguese Native Language Identification Dataset	Apr 30, 2018	Language AcquisitionLanguage Identification	—Unverified
SOLID: A Large-Scale Semi-Supervised Dataset for Offensive Language Identification	Apr 29, 2020	Language Identification	—Unverified
Beware Haters at ComMA@ICON: Sequence and Ensemble Classifiers for Aggression, Gender Bias and Communal Bias Identification in Indian Languages	Dec 1, 2021	Language Identification	—Unverified
A Perplexity-Based Method for Similar Languages Discrimination	Apr 1, 2017	Language Identification	—Unverified
BERT-based Multi-Task Model for Country and Province Level MSA and Dialectal Arabic Identification	Apr 1, 2021	Language IdentificationMulti-Task Learning	—Unverified
BERT-based Multi-Task Model for Country and Province Level Modern Standard Arabic and Dialectal Arabic Identification	Jun 23, 2021	Language IdentificationMulti-Task Learning	—Unverified
An Unsupervised Morphological Criterion for Discriminating Similar Languages	Dec 1, 2016	Language IdentificationText Categorization	—Unverified
A language model based approach towards large scale and lightweight language identification systems	Oct 13, 2015	Language IdentificationLanguage Modeling	—Unverified
A Deep Generative Approach to Native Language Identification	Dec 1, 2020	BIG-bench Machine LearningLanguage Identification	—Unverified
A Code-Switching Corpus of Turkish-German Conversations	Apr 1, 2017	Automatic Speech Recognition (ASR)Language Identification	—Unverified
Beefmoves: Dissemination, Diversity, and Dynamics of English Borrowings in a German Hip Hop Forum	Jul 1, 2012	DiversityLanguage Identification	—Unverified
Babler - Data Collection from the Web to Support Speech Recognition and Keyword Search	Aug 1, 2016	Automatic Speech Recognition (ASR)Language Identification	—Unverified
An Overview of Indian Spoken Language Recognition from Machine Learning Perspective	Nov 30, 2022	Language IdentificationSpoken language identification	—Unverified
Automatic Token and Turn Level Language Identification for Code-Switched Text Dialog: An Analysis Across Language Pairs and Corpora	Jul 1, 2018	Language IdentificationSpoken Language Understanding	—Unverified
Automatic Spoken Language Identification using a Time-Delay Neural Network	May 19, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
A Novel Learnable Dictionary Encoding Layer for End-to-End Language Identification	Apr 2, 2018	Language Identification	—Unverified
Automatic Spoken Language Identification Utilizing Acoustic and Phonetic Speech Information	Jun 1, 2004	Language Identificationspeech-recognition	—Unverified
Automatic language identity tagging on word and sentence-level in multilingual text sources: a case-study on Luxembourgish	May 1, 2014	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
AUTOMATIC LANGUAGE IDENTIFICATION USING DEEP NEURAL NETWORKS	May 4, 2014	Acoustic ModellingLanguage Identification	—Unverified
Automatic Language Identification System for Hindi and Magahi	Apr 13, 2018	Language Identification	—Unverified
Annotation Efficient Language Identification from Weak Labels	Nov 1, 2020	Language Identification	—Unverified
Addition of Code Mixed Features to Enhance the Sentiment Prediction of Song Lyrics	Jun 11, 2018	Language IdentificationOpinion Mining	—Unverified
Automatic Language Identification for Romance Languages using Stop Words and Diacritics	Jun 14, 2018	Language Identification	—Unverified
Automatic Language Identification for Celtic Texts	Mar 9, 2022	Language Identification	—Unverified
DCU-UVT: Word-Level Language Classification with Code-Mixed Data	Oct 1, 2014	ClassificationGeneral Classification	—Unverified
Automatic language identification	Aug 1, 2001	Language Identification	—Unverified
Anlirika: An LSTM–CNN Flow Twister for Spoken Language Identification	Jun 1, 2021	Language IdentificationSpoken language identification	—Unverified
Automatic Identification of Maghreb Dialects Using a Dictionary-Based Approach	May 1, 2018	Information RetrievalLanguage Identification	—Unverified
CUSATNLP@HASOC-Dravidian-CodeMix-FIRE2020:Identifying Offensive Language from ManglishTweets	Oct 17, 2020	Information RetrievalLanguage Identification	—Unverified
CUSATNLP@DravidianLangTech-EACL2021:Language Agnostic Classification of Offensive Content in Tweets	Apr 1, 2021	Language IdentificationPosition	—Unverified
Automatic Identification of Learners' Language Background Based on Their Writing in Czech	Oct 1, 2013	Language AcquisitionLanguage Identification	—Unverified
Curriculum Design for Code-switching: Experiments with Language Identification and Language Modeling with Deep Neural Networks	Dec 1, 2017	Language IdentificationLanguage Modeling	—Unverified
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages	Sep 17, 2023	HallucinationLanguage Identification	—Unverified
Automatic Identification of Closely-related Indian Languages: Resources and Experiments	Mar 26, 2018	Language Identification	—Unverified
cs@DravidianLangTech-EACL2021: Offensive Language Identification Based On Multilingual BERT Model	Apr 1, 2021	Language Identificationtext-classification	—Unverified
Data Filtering using Cross-Lingual Word Embeddings	Jun 1, 2021	Cross-Lingual Word EmbeddingsLanguage Identification	—Unverified
Cross-Linguistic Offensive Language Detection: BERT-Based Analysis of Bengali, Assamese, & Bodo Conversational Hateful Content from Social Media	Dec 16, 2023	Language Identification	—Unverified
Automatic discovery of Latin syntactic changes	Aug 1, 2016	Language Identification	—Unverified

Show:10 25 50

← PrevPage 4 of 16Next →

All datasets VOXLINGUA107 GlotLID-C Nordic Language Identification OpenSubtitles Universal Dependencies VoxForge

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	wav2vec 2.0 LV-60K	Error rate	7.2	—	Unverified
2	XLS-R	Error rate	5.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GlotLID	Macro F1	0.98	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	FastText	Accuracy	0.97	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Apple bi-LSTM	Accuracy	91.37	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Apple bi-LSTM	Accuracy	86.93	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ConformerG-P	Accuracy	99.8	—	Unverified