Language Identification

Language identification is the task of determining the language of a text.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 251–300 of 794 papers

Title	Date	Tasks	Status
Ensemble Methods for Native Language Identification	Sep 1, 2017	Language AcquisitionLanguage Identification	—Unverified
Evaluating HeLI with Non-Linear Mappings	Apr 1, 2017	Language IdentificationPosition	—Unverified
Evaluating Input Representation for Language Identification in Hindi-English Code Mixed Text	Nov 23, 2020	Language IdentificationSentence	—Unverified
Evaluating Standard and Dialectal Frisian ASR: Multilingual Fine-tuning and Language Identification for Improved Low-resource Performance	Feb 7, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Cross-Corpora Spoken Language Identification with Domain Diversification and Generalization	Feb 10, 2023	Data AugmentationDomain Generalization	—Unverified
Automatic Detection of Intra-Word Code-Switching	Aug 1, 2016	Language Identification	—Unverified
Cross-Corpora Language Recognition: A Preliminary Investigation with Indian Languages	May 10, 2021	Language IdentificationSpoken language identification	—Unverified
Automatic Detection of Code-switching Style from Acoustics	Jul 1, 2018	Automatic Speech Recognition (ASR)Language Identification	—Unverified
A Neural Model for Language Identification in Code-Switched Tweets	Nov 1, 2016	Language IdentificationLanguage Modeling	—Unverified
A Fast, Compact, Accurate Model for Language Identification of Codemixed Text	Oct 9, 2018	DecoderLanguage Identification	—Unverified
CoSwID, a Code Switching Identification Method Suitable for Under-Resourced Languages	Jun 1, 2022	Language Identification	—Unverified
Automatic Detection of Arabicized Berber and Arabic Varieties	Dec 1, 2016	Language Identification	—Unverified
Corpus Creation and Language Identification in Low-Resource Code-Mixed Telugu-English Text	Sep 1, 2021	ClassificationLanguage Identification	—Unverified
Corpora of social media in minority Uralic languages	Jan 1, 2019	Language Identification	—Unverified
Automatic Detection and Language Identification of Multilingual Documents	Jan 1, 2014	Language IdentificationMachine Translation	—Unverified
An Attention Based Neural Network for Code Switching Detection: English & Roman Urdu	Mar 3, 2021	Language Identification	—Unverified
Coreference Resolution in FreeLing 4.0	May 1, 2018	Constituency Parsingcoreference-resolution	—Unverified
ConvAI at SemEval-2019 Task 6: Offensive Language Identification and Categorization with Perspective and BERT	Jun 1, 2019	Language Identification	—Unverified
Automatic Classification of Spoken Languages using Diverse Acoustic Features	Oct 1, 2015	ClassificationGeneral Classification	—Unverified
Confidence-based Ensembles of End-to-End Speech Recognition Models	Jun 27, 2023	Language IdentificationModel Selection	—Unverified
Computationally efficient discrimination between language varieties with large feature vectors and regularized classifiers	Aug 1, 2018	General ClassificationLanguage Identification	—Unverified
Automated speech tools for helping communities process restricted-access corpora for language revival efforts	Apr 15, 2022	Action DetectionActivity Detection	—Unverified
An Assessment of Language Identification Methods on Tweets and Wikipedia Articles	Jul 1, 2020	ArticlesInformation Retrieval	—Unverified
Adversarial Training for Multilingual Acoustic Modeling	Jun 17, 2019	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Adaptation de domaine non supervis\'ee pour la reconnaissance de la langue par r\'egularisation d'un r\'eseau de neurones (Unsupervised domain adaptation for language identification by regularization of a neural network)	Jun 1, 2020	Domain AdaptationLanguage Identification	—Unverified
Computational Approaches to Arabic-English Code-Switching	Oct 17, 2024	Data AugmentationLanguage Identification	—Unverified
Comparing Two Basic Methods for Discriminating Between Similar Languages and Varieties	Dec 1, 2016	Automatic Speech Recognition (ASR)General Classification	—Unverified
Automated essay scoring with string kernels and word embeddings	Apr 21, 2018	Automated Essay ScoringDialect Identification	—Unverified
Comparing Approaches to the Identification of Similar Languages	Sep 1, 2015	Language Identification	—Unverified
Augmented Transformers with Adaptive n-grams Embedding for Multilingual Scene Text Recognition	Feb 28, 2023	Language IdentificationScene Text Recognition	—Unverified
Analysis of Twitter Data for Postmarketing Surveillance in Pharmacovigilance	Dec 1, 2016	Language IdentificationPharmacovigilance	—Unverified
Comparing Approaches to Dravidian Language Identification	Mar 9, 2021	Dialect IdentificationLanguage Identification	—Unverified
A Two-level Classifier for Discriminating Similar Languages	Sep 1, 2015	Language IdentificationMachine Translation	—Unverified
ComMA@ICON: Multilingual Gender Biased and Communal Language Identification Task at ICON-2021	Dec 1, 2021	Aggression IdentificationClassification	—Unverified
COMI-LINGUA: Expert Annotated Large-Scale Dataset for Multitask NLP in Hindi-English Code-Mixing	Mar 27, 2025	Language Identificationnamed-entity-recognition	—Unverified
A Twitter BERT Approach for Offensive Language Detection in Marathi	Dec 20, 2022	Data AugmentationLanguage Identification	—Unverified
Analysis of Named Entity Recognition and Linking for Tweets	Oct 27, 2014	Entity DisambiguationLanguage Identification	—Unverified
Adversarial synthesis based data-augmentation for code-switched spoken language identification	May 30, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Combining Textual and Speech Features in the NLI Task Using State-of-the-Art Machine Learning Techniques	Sep 1, 2017	Language AcquisitionLanguage Identification	—Unverified
Combining Shallow and Linguistically Motivated Features in Native Language Identification	Jun 1, 2013	Language IdentificationNative Language Identification	—Unverified
A Turkish-German Code-Switching Corpus	May 1, 2016	Language IdentificationSentence	—Unverified
Columbia-Jadavpur submission for EMNLP 2016 Code-Switching Workshop Shared Task: System description	Nov 1, 2016	Language Identification	—Unverified
A Multi-Task Text Classification Pipeline with Natural Language Explanations: A User-Centric Evaluation in Sentiment Analysis and Offensive Language Identification in Greek Tweets	Oct 14, 2024	Feature ImportanceLanguage Identification	—Unverified
Collecting Code-Switched Data from Social Media	May 1, 2018	Language IdentificationLanguage Modeling	—Unverified
CoLI-Machine Learning Approaches for Code-mixed Language Identification at the Word Level in Kannada-English Texts	Nov 17, 2022	Language IdentificationSentence	—Unverified
Attention-Guided Adaptation for Code-Switching Speech Recognition	Dec 14, 2023	Language Identificationspeech-recognition	—Unverified
CoLi at UdS at SemEval-2020 Task 12: Offensive Tweet Detection with Ensembling	Dec 1, 2020	BIG-bench Machine LearningLanguage Identification	—Unverified
Cognitive Computing to Optimize IT Services	Dec 28, 2021	Language IdentificationText Summarization	—Unverified
A Text-to-Text Model for Multilingual Offensive Language Identification	Dec 6, 2023	DecoderLanguage Identification	—Unverified
Amrita_CEN_NLP@DravidianLangTech-EACL2021: Deep Learning-based Offensive Language Identification in Malayalam, Tamil and Kannada	Apr 1, 2021	Language Identification	—Unverified

Show:10 25 50

← PrevPage 6 of 16Next →

All datasets VOXLINGUA107 GlotLID-C Nordic Language Identification OpenSubtitles Universal Dependencies VoxForge

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	wav2vec 2.0 LV-60K	Error rate	7.2	—	Unverified
2	XLS-R	Error rate	5.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GlotLID	Macro F1	0.98	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	FastText	Accuracy	0.97	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Apple bi-LSTM	Accuracy	91.37	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Apple bi-LSTM	Accuracy	86.93	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ConformerG-P	Accuracy	99.8	—	Unverified