Language Identification

Language identification is the task of determining the language of a text.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–200 of 794 papers

Title	Date	Tasks	Status
Code Mixing: A Challenge for Language Identification in the Language of Social Media	Oct 1, 2014	Language Identification	—Unverified
Code Switched and Code Mixed Speech Recognition for Indic languages	Mar 30, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
A Compact End-to-End Model with Local and Global Context for Spoken Language Identification	Oct 27, 2022	Language IdentificationSpoken language identification	—Unverified
Code-Switched Named Entity Recognition with Embedding Attention	Jul 1, 2018	Language Identificationnamed-entity-recognition	—Unverified
Codeswitching language identification using Subword Information Enriched Word Vectors	Nov 1, 2016	Language IdentificationNamed Entity Recognition (NER)	—Unverified
Code-Switching Ubique Est - Language Identification and Part-of-Speech Tagging for Historical Mixed Text	Aug 1, 2016	Language IdentificationPart-Of-Speech Tagging	—Unverified
Codewithzichao@DravidianLangTech-EACL2021: Exploring Multilingual Transformers for Offensive Language Identification on Code Mixing Text	Apr 1, 2021	Language Identification	—Unverified
Cognate and Misspelling Features for Natural Language Identification	Jun 1, 2013	Language Identification	—Unverified
Cognitive Computing to Optimize IT Services	Dec 28, 2021	Language IdentificationText Summarization	—Unverified
CoLi at UdS at SemEval-2020 Task 12: Offensive Tweet Detection with Ensembling	Dec 1, 2020	BIG-bench Machine LearningLanguage Identification	—Unverified
CoLI-Machine Learning Approaches for Code-mixed Language Identification at the Word Level in Kannada-English Texts	Nov 17, 2022	Language IdentificationSentence	—Unverified
Collecting Code-Switched Data from Social Media	May 1, 2018	Language IdentificationLanguage Modeling	—Unverified
Columbia-Jadavpur submission for EMNLP 2016 Code-Switching Workshop Shared Task: System description	Nov 1, 2016	Language Identification	—Unverified
Attentive Temporal Pooling for Conformer-based Streaming Language Identification in Long-form Speech	Feb 24, 2022	Domain AdaptationForm	—Unverified
Combining Shallow and Linguistically Motivated Features in Native Language Identification	Jun 1, 2013	Language IdentificationNative Language Identification	—Unverified
Combining Textual and Speech Features in the NLI Task Using State-of-the-Art Machine Learning Techniques	Sep 1, 2017	Language AcquisitionLanguage Identification	—Unverified
COMI-LINGUA: Expert Annotated Large-Scale Dataset for Multitask NLP in Hindi-English Code-Mixing	Mar 27, 2025	Language Identificationnamed-entity-recognition	—Unverified
ComMA@ICON: Multilingual Gender Biased and Communal Language Identification Task at ICON-2021	Dec 1, 2021	Aggression IdentificationClassification	—Unverified
DeepAnalyzer at SemEval-2019 Task 6: A deep learning-based ensemble method for identifying offensive tweets	Jun 1, 2019	Language IdentificationPart-Of-Speech Tagging	—Unverified
Comparing Approaches to Dravidian Language Identification	Mar 9, 2021	Dialect IdentificationLanguage Identification	—Unverified
Comparing Approaches to the Identification of Similar Languages	Sep 1, 2015	Language Identification	—Unverified
Augmented Transformers with Adaptive n-grams Embedding for Multilingual Scene Text Recognition	Feb 28, 2023	Language IdentificationScene Text Recognition	—Unverified
Comparing Two Basic Methods for Discriminating Between Similar Languages and Varieties	Dec 1, 2016	Automatic Speech Recognition (ASR)General Classification	—Unverified
Computational Approaches to Arabic-English Code-Switching	Oct 17, 2024	Data AugmentationLanguage Identification	—Unverified
Computationally efficient discrimination between language varieties with large feature vectors and regularized classifiers	Aug 1, 2018	General ClassificationLanguage Identification	—Unverified
Confidence-based Ensembles of End-to-End Speech Recognition Models	Jun 27, 2023	Language IdentificationModel Selection	—Unverified
ConvAI at SemEval-2019 Task 6: Offensive Language Identification and Categorization with Perspective and BERT	Jun 1, 2019	Language Identification	—Unverified
Coreference Resolution in FreeLing 4.0	May 1, 2018	Constituency Parsingcoreference-resolution	—Unverified
Corpora of social media in minority Uralic languages	Jan 1, 2019	Language Identification	—Unverified
Corpus Creation and Language Identification in Low-Resource Code-Mixed Telugu-English Text	Sep 1, 2021	ClassificationLanguage Identification	—Unverified
CoSwID, a Code Switching Identification Method Suitable for Under-Resourced Languages	Jun 1, 2022	Language Identification	—Unverified
Challenges of Computational Processing of Code-Switching	Oct 7, 2016	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Automatic Detection of Code-switching Style from Acoustics	Jul 1, 2018	Automatic Speech Recognition (ASR)Language Identification	—Unverified
Cross-Corpora Language Recognition: A Preliminary Investigation with Indian Languages	May 10, 2021	Language IdentificationSpoken language identification	—Unverified
Cross-Corpora Spoken Language Identification with Domain Diversification and Generalization	Feb 10, 2023	Data AugmentationDomain Generalization	—Unverified
Cross-corpus Native Language Identification via Statistical Embedding	Jun 1, 2018	Cross-corpusLanguage Identification	—Unverified
Automatic Detection of Sentence Fragments	Jul 1, 2015	Grammatical Error CorrectionLanguage Identification	—Unverified
Cross-domain Feature Selection for Language Identification	Nov 11, 2011	feature selectionLanguage Identification	—Unverified
Cross-lingual Inductive Transfer to Detect Offensive Language	Jul 7, 2020	Language IdentificationPosition	—Unverified
An Exploratory Analysis of the Relation Between Offensive Language and Mental Health	May 31, 2021	Depression DetectionLanguage Identification	—Unverified
Challenges in Neural Language Identification: NRC at VarDial 2020	Dec 1, 2020	Language Identification	—Unverified
cs@DravidianLangTech-EACL2021: Offensive Language Identification Based On Multilingual BERT Model	Apr 1, 2021	Language Identificationtext-classification	—Unverified
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages	Sep 17, 2023	HallucinationLanguage Identification	—Unverified
Curriculum Design for Code-switching: Experiments with Language Identification and Language Modeling with Deep Neural Networks	Dec 1, 2017	Language IdentificationLanguage Modeling	—Unverified
A Report on the VarDial Evaluation Campaign 2020	Dec 1, 2020	Dialect IdentificationLanguage Identification	—Unverified
CUSATNLP@HASOC-Dravidian-CodeMix-FIRE2020:Identifying Offensive Language from ManglishTweets	Oct 17, 2020	Information RetrievalLanguage Identification	—Unverified
Automatic Identification of Maghreb Dialects Using a Dictionary-Based Approach	May 1, 2018	Information RetrievalLanguage Identification	—Unverified
Data Filtering using Cross-Lingual Word Embeddings	Jun 1, 2021	Cross-Lingual Word EmbeddingsLanguage Identification	—Unverified
DCU-UVT: Word-Level Language Classification with Code-Mixed Data	Oct 1, 2014	ClassificationGeneral Classification	—Unverified
Ceasing hate withMoH: Hate Speech Detection in Hindi-English Code-Switched Language	Oct 18, 2021	Hate Speech DetectionLanguage Identification	—Unverified

Show:10 25 50

← PrevPage 4 of 16Next →

All datasets VOXLINGUA107 GlotLID-C Nordic Language Identification OpenSubtitles Universal Dependencies VoxForge

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	wav2vec 2.0 LV-60K	Error rate	7.2	—	Unverified
2	XLS-R	Error rate	5.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GlotLID	Macro F1	0.98	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	FastText	Accuracy	0.97	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Apple bi-LSTM	Accuracy	91.37	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Apple bi-LSTM	Accuracy	86.93	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ConformerG-P	Accuracy	99.8	—	Unverified