Language Identification

Language identification is the task of determining the language of a text.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 551–600 of 794 papers

Title	Date	Tasks	Status
Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification	Sep 13, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Streaming Language Identification using Combination of Acoustic Representations and ASR Hypotheses	Jun 1, 2020	Language Identificationspeech-recognition	—Unverified
String Kernels for Native Language Identification: Insights from Behind the Curtains	Sep 1, 2016	Language IdentificationNative Language Identification	—Unverified
Subdialectal Differences in Sorani Kurdish	Dec 1, 2016	General ClassificationInformation Retrieval	—Unverified
Subsegmental language detection in Celtic language text	Aug 1, 2014	Language IdentificationLanguage Modelling	—Unverified
Subword-Level Language Identification for Intra-Word Code-Switching	Apr 3, 2019	Language Identification	—Unverified
SU-NLP at SemEval-2020 Task 12: Offensive Language IdentifiCation in Turkish Tweets	Dec 1, 2020	Language IdentificationWord Embeddings	—Unverified
SwissAdmin: A multilingual tagged parallel corpus of press releases	May 1, 2014	Language IdentificationSentence	—Unverified
T\"ubingen-Oslo at SemEval-2018 Task 2: SVMs perform better than RNNs in Emoji Prediction	Jun 1, 2018	Document ClassificationGeneral Classification	—Unverified
T\"ubingen-Oslo Team at the VarDial 2018 Evaluation Campaign: An Analysis of N-gram Features in Language Variety Identification	Aug 1, 2018	Dialect IdentificationDocument Classification	—Unverified
T\"ubingen system in VarDial 2017 shared task: experiments with language identification and cross-lingual parsing	Apr 1, 2017	Dependency ParsingLanguage Identification	—Unverified
Tackling the Score Shift in Cross-Lingual Speaker Verification by Exploiting Language Information	Oct 18, 2021	Language IdentificationSpeaker Recognition	—Unverified
TalTech Systems for the Interspeech 2025 ML-SUPERB 2.0 Challenge	Jun 2, 2025	Language Identificationspeech-recognition	—Unverified
Team Rouges at SemEval-2020 Task 12: Cross-lingual Inductive Transfer to Detect Offensive Language	Dec 1, 2020	Language IdentificationPosition	—Unverified
TECHSSN at SemEval-2020 Task 12: Offensive Language Detection Using BERT Embeddings	Dec 1, 2020	Language Identification	—Unverified
TechSSN at SemEval-2022 Task 6: Intended Sarcasm Detection using Transformer Models	Jul 1, 2022	Language IdentificationSarcasm Detection	—Unverified
Text Normalization Infrastructure that Scales to Hundreds of Language Varieties	May 1, 2018	Language IdentificationLanguage Modeling	—Unverified
Text segmentation for Language Identification in Greek Forums	Sep 1, 2013	Information RetrievalLanguage Identification	—Unverified
The ASRU 2019 Mandarin-English Code-Switching Speech Recognition Challenge: Open Datasets, Tracks, Methods and Results	Jul 12, 2020	Data AugmentationLanguage Identification	—Unverified
The CMU Submission for the Shared Task on Language Identification in Code-Switched Data	Oct 1, 2014	Language IdentificationLearning Word Embeddings	—Unverified
The French-Algerian Code-Switching Triggered audio corpus (FACST)	May 1, 2018	Language IdentificationTransliteration	—Unverified
The futility of STILTs for the classification of lexical borrowings in Spanish	Sep 17, 2021	Language Identificationnamed-entity-recognition	—Unverified
The Howard University System Submission for the Shared Task in Language Identification in Spanish-English Codeswitching	Nov 1, 2016	Language Identification	—Unverified
The ILSP/ARC submission to the WMT 2016 Bilingual Document Alignment Shared Task	Aug 1, 2016	ARCLanguage Identification	—Unverified
The IUCL+ System: Word-Level Language Identification via Extended Markov Models	Oct 1, 2014	Language IdentificationNamed Entity Recognition (NER)	—Unverified
The Jinan Chinese Learner Corpus	Jun 1, 2015	Language AcquisitionLanguage Identification	—Unverified
The MERLIN corpus: Learner language and the CEFR	May 1, 2014	Language AcquisitionLanguage Identification	—Unverified
The Mysterious Letter J	Sep 1, 2013	Information RetrievalLanguage Identification	—Unverified
The NRC System for Discriminating Similar Languages	Aug 1, 2014	Language IdentificationMachine Translation	—Unverified
The Obscure Limitation of Modular Multilingual Language Models	Nov 21, 2023	Language Identification	—Unverified
The Power of Character N-grams in Native Language Identification	Sep 1, 2017	Language IdentificationNative Language Identification	—Unverified
The RATS Collection: Supporting HLT Research with Degraded Audio Data	May 1, 2014	Action DetectionActivity Detection	—Unverified
The Role of Emotions in Native Language Identification	Oct 1, 2018	Deception DetectionLanguage Identification	—Unverified
The Story of the Characters, the DNA and the Native Language	Jun 1, 2013	Language IdentificationText Categorization	—Unverified
The Titans at SemEval-2019 Task 6: Offensive Language Identification, Categorization and Target Identification	Jun 1, 2019	Language Identification	—Unverified
TLAXCALA: a multilingual corpus of independent news	May 1, 2014	Language IdentificationMachine Translation	—Unverified
Token Masking Improves Transformer-Based Text Classification	May 16, 2025	AttributeClassification	—Unverified
Towards a Common Speech Analysis Engine	Mar 1, 2022	Emotion RecognitionLanguage Identification	—Unverified
Towards End-to-End Code-Switching Speech Recognition	Oct 31, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Towards Generalized Offensive Language Identification	Jul 26, 2024	Language Identification	—Unverified
Towards Language Technology for Mi'kmaq	May 1, 2018	Language IdentificationLanguage Modelling	—Unverified
Towards Relevance and Sequence Modeling in Language Recognition	Apr 2, 2020	Language IdentificationSpeaker Recognition	—Unverified
Towards spoken dialect identification of Irish	Jul 14, 2023	Dialect IdentificationLanguage Identification	—Unverified
Unified model for code-switching speech recognition and language identification based on a concatenated tokenizer	Jun 14, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Transducer-based language embedding for spoken language identification	Apr 8, 2022	Language IdentificationSpoken language identification	—Unverified
Transductive Learning with String Kernels for Cross-Domain Text Classification	Nov 2, 2018	ClassificationCross-Domain Text Classification	—Unverified
Transformer-based Model for Word Level Language Identification in Code-mixed Kannada-English Texts	Nov 26, 2022	Language Identification	—Unverified
Translated Texts Under the Lens: From Machine Translation Detection to Source Language Identification	Jan 16, 2022	Language IdentificationMachine Translation	—Unverified
Translationese: Between Human and Machine Translation	Dec 1, 2016	Language IdentificationMachine Translation	—Unverified
Transliteration Better than Translation? Answering Code-mixed Questions over a Knowledge Base	Jul 1, 2018	Automatic Speech Recognition (ASR)Information Retrieval	—Unverified

Show:10 25 50

← PrevPage 12 of 16Next →

All datasets VOXLINGUA107 GlotLID-C Nordic Language Identification OpenSubtitles Universal Dependencies VoxForge

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	wav2vec 2.0 LV-60K	Error rate	7.2	—	Unverified
2	XLS-R	Error rate	5.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GlotLID	Macro F1	0.98	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	FastText	Accuracy	0.97	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Apple bi-LSTM	Accuracy	91.37	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Apple bi-LSTM	Accuracy	86.93	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ConformerG-P	Accuracy	99.8	—	Unverified