Language Identification

Language identification is the task of determining the language of a text.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 601–650 of 794 papers

Title	Date	Tasks	Status
OLR 2021 Challenge: Datasets, Rules and Baselines	Jul 23, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
On-Device Language Identification of Text in Images using Diacritic Characters	Nov 10, 2020	Language Identificationobject-detection	—Unverified
On The Performance of Time-Pooling Strategies for End-to-End Spoken Language Identification	May 1, 2020	Language IdentificationRepresentation Learning	—Unverified
On the use of Performer and Agent Attention for Spoken Language Identification	Feb 9, 2025	Language IdentificationSelf-Supervised Learning	—Unverified
Open-Set Language Identification	Jul 16, 2017	General ClassificationLanguage Identification	—Unverified
Optimizing a Supervised Classifier for a Difficult Language Identification Problem	Apr 1, 2021	Language Identificationregression	—Unverified
OpusFilter: A Configurable Parallel Corpus Filtering Toolbox	Jul 1, 2020	Domain AdaptationLanguage Identification	—Unverified
OpusTools and Parallel Corpus Diagnostics	May 1, 2020	Language Identification	—Unverified
Oracle and Human Baselines for Native Language Identification	Jun 1, 2015	Language IdentificationNative Language Identification	—Unverified
Oriental Language Recognition (OLR) 2020: Summary and Analysis	Jul 5, 2021	Dialect IdentificationLanguage Identification	—Unverified
Overview for the First Shared Task on Language Identification in Code-Switched Data	Oct 1, 2014	Language Identification	—Unverified
Overview for the Second Shared Task on Language Identification in Code-Switched Data	Sep 28, 2019	Language IdentificationSingle Particle Analysis	—Unverified
Overview of the DSL Shared Task 2015	Sep 1, 2015	Language Identification	—Unverified
Overview of the HASOC Subtrack at FIRE 2022: Offensive Language Identification in Marathi	Nov 18, 2022	Language Identification	—Unverified
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification	Feb 20, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Parsing Learner Text: to Shoehorn or not to Shoehorn	Jun 1, 2015	Language Identification	—Unverified
Partial Coupling of Optimal Transport for Spoken Language Identification	Mar 31, 2022	Domain AdaptationLanguage Identification	—Unverified
Part of Speech Annotation of a Turkish-German Code-Switching Corpus	Aug 1, 2016	Language Identification	—Unverified
Part-of-Speech Tagging for Code-Switched, Transliterated Texts without Explicit Language Identification	Oct 1, 2018	Language IdentificationPart-Of-Speech Tagging	—Unverified
Part-of-speech Tagging of Code-mixed Social Media Content: Pipeline, Stacking and Joint Modelling	Nov 1, 2016	Language IdentificationPart-Of-Speech Tagging	—Unverified
Part-of-speech Tagging of Code-Mixed Social Media Text	Nov 1, 2016	Language IdentificationPart-Of-Speech Tagging	—Unverified
PGSG at SemEval-2020 Task 12: BERT-LSTM with Tweets' Pretrained Model and Noisy Student Training Method	Dec 1, 2020	Language Identification	—Unverified
Phone-aware Neural Language Identification	May 9, 2017	Language Identification	—Unverified
Phonetic Temporal Neural Model for Language Identification	May 9, 2017	Language Identificationmodel	—Unverified
Pin\_cod\_ at SemEval-2020 Task 12: Injecting Lexicons into Bidirectional Long Short-Term Memory Networks to Detect Turkish Offensive Tweets	Dec 1, 2020	Language Identification	—Unverified
POS Tagging of English-Hindi Code-Mixed Social Media Content	Oct 1, 2014	Language IdentificationPOS	—Unverified
POS Tagging of Hindi-English Code Mixed Text from Social Media: Some Machine Learning Experiments	Dec 1, 2015	BIG-bench Machine LearningLanguage Identification	—Unverified
Predicting Code-switching in Multilingual Communication for Immigrant Communities	Oct 1, 2014	Language Identification	—Unverified
Predicting Foreign Language Usage from English-Only Social Media Posts	Jun 1, 2018	Cross-Lingual TransferLanguage Identification	—Unverified
Pretraining Approaches for Spoken Language Recognition: TalTech Submission to the OLR 2021 Challenge	May 14, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
PRHLT-UPV at SemEval-2020 Task 12: BERT for Multilingual Offensive Language Detection	Dec 1, 2020	Language Identification	—Unverified
professionals@DravidianLangTech-EACL2021: Malayalam Offensive Language Identification - A Minimalistic Approach	Apr 1, 2021	Language Identification	—Unverified
Prompt Engineering Using GPT for Word-Level Code-Mixed Language Identification in Low-Resource Dravidian Languages	Nov 6, 2024	Information RetrievalLanguage Identification	—Unverified
PUM at SemEval-2020 Task 12: Aggregation of Transformer-based models' features for offensive language recognition	Oct 5, 2020	Language Identification	—Unverified
Punctuation as Native Language Interference	Aug 1, 2018	ClassificationCross-corpus	—Unverified
Query log analysis with LangLog	Apr 1, 2012	Language Identification	—Unverified
Racial Disparity in Natural Language Processing: A Case Study of Social Media African-American English	Jun 30, 2017	FairnessLanguage Identification	—Unverified
Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting	Jun 18, 2024	DecoderLanguage Identification	—Unverified
Recognizing English Learners' Native Language from Their Writings	Jun 1, 2013	Language IdentificationText Classification	—Unverified
Reconstructing an Indo-European Family Tree from Non-native English Texts	Aug 1, 2013	Language Identification	—Unverified
Recursive Semantic Anchoring in ISO 639:2023: A Structural Extension to ISO/TC 37 Frameworks	Jun 7, 2025	Language Identification	—Unverified
Rediscovering the Slavic Continuum in Representations Emerging from Neural Models of Spoken Language Identification	Oct 22, 2020	Language IdentificationSpoken language identification	—Unverified
Reference Scope Identification in Citing Sentences	Jun 1, 2012	Language IdentificationParaphrase Identification	—Unverified
Regression or classification? Automated Essay Scoring for Norwegian	Aug 1, 2019	Automated Essay ScoringBIG-bench Machine Learning	—Unverified
Rnn-transducer with language bias for end-to-end Mandarin-English code-switching speech recognition	Feb 19, 2020	Language Identificationspeech-recognition	—Unverified
RoBERTweet: A BERT Language Model for Romanian Tweets	Jun 11, 2023	Language IdentificationLanguage Modeling	—Unverified
Robust, Lexicalized Native Language Identification	Dec 1, 2012	Language IdentificationNative Language Identification	—Unverified
Robust Open-Set Spoken Language Identification and the CU MultiLang Dataset	Aug 29, 2023	Language IdentificationSpoken language identification	—Unverified
Robust Speech Representation Learning via Flow-based Embedding Regularization	Dec 7, 2021	Deep LearningLanguage Identification	—Unverified
Romanized Berber and Romanized Arabic Automatic Language Identification Using Machine Learning	Dec 1, 2016	BIG-bench Machine LearningLanguage Identification	—Unverified

Show:10 25 50

← PrevPage 13 of 16Next →

All datasets VOXLINGUA107 GlotLID-C Nordic Language Identification OpenSubtitles Universal Dependencies VoxForge

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	wav2vec 2.0 LV-60K	Error rate	7.2	—	Unverified
2	XLS-R	Error rate	5.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GlotLID	Macro F1	0.98	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	FastText	Accuracy	0.97	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Apple bi-LSTM	Accuracy	91.37	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Apple bi-LSTM	Accuracy	86.93	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ConformerG-P	Accuracy	99.8	—	Unverified