Language Identification

Language identification is the task of determining the language of a text.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 401–450 of 794 papers

Title	Date	Tasks	Status
From Visualisation to Hypothesis Construction for Second Language Acquisition	Oct 1, 2014	Language AcquisitionLanguage Identification	—Unverified
Fully Connected Neural Network with Advance Preprocessor to Identify Aggression over Facebook and Twitter	Aug 1, 2018	Aggression IdentificationHate Speech Detection	—Unverified
Fumbling in Babel: An Investigation into ChatGPT's Language Identification Ability	Nov 16, 2023	Language Identification	—Unverified
Fusion of Simple Models for Native Language Identification	Sep 1, 2017	Information RetrievalLanguage Identification	—Unverified
Galileo at SemEval-2020 Task 12: Multi-lingual Learning for Offensive Language Identification using Pre-trained Language Models	Oct 7, 2020	AllKnowledge Distillation	—Unverified
Garain at SemEval-2020 Task 12: Sequence based Deep Learning for Categorizing Offensive Language in Social Media	Sep 2, 2020	Language Identification	—Unverified
Gender Prediction in English-Hindi Code-Mixed Social Media Content : Corpus and Baseline System	Jun 14, 2018	Author ProfilingGender Prediction	—Unverified
Generation through the lens of learning theory	Oct 17, 2024	Language IdentificationLearning Theory	—Unverified
Generative linguistic representation for spoken language identification	Dec 18, 2023	DecoderLanguage Identification	—Unverified
GlobalPhone: Pronunciation Dictionaries in 20 Languages	May 1, 2014	Language IdentificationLanguage Modelling	—Unverified
GLUECoS : An Evaluation Benchmark for Code-Switched NLP	Apr 26, 2020	Language Identificationnamed-entity-recognition	—Unverified
GLUECoS: An Evaluation Benchmark for Code-Switched NLP	Jul 1, 2020	Language Identificationnamed-entity-recognition	—Unverified
HAD-T\"ubingen at SemEval-2019 Task 6: Deep Learning Analysis of Offensive Language on Twitter: Identification and Categorization	Jun 1, 2019	Language Identification	—Unverified
HaT5: Hate Language Identification using Text-to-Text Transfer Transformer	Feb 11, 2022	Data AugmentationExplainable artificial intelligence	—Unverified
HeLI-based Experiments in Discriminating Between Dutch and Flemish Subtitles	Aug 1, 2018	ClusteringLanguage Identification	—Unverified
HeLI-based Experiments in Swiss German Dialect Identification	Aug 1, 2018	Dialect IdentificationLanguage Identification	—Unverified
HeLI-OTS, Off-the-shelf Language Identifier for Text	Jun 1, 2022	Language Identification	—Unverified
HHU at SemEval-2019 Task 6: Context Does Matter - Tackling Offensive Language Identification and Categorization with ELMo	Jun 1, 2019	Language Identification	—Unverified
Hindi-English Code-Switching Speech Corpus	Sep 24, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Hitachi at SemEval-2020 Task 12: Offensive Language Identification with Noisy Labels using Statistical Sampling and Post-Processing	May 1, 2020	Language IdentificationPosition	—Unverified
HUB@DravidianLangTech-EACL2021: Identify and Classify Offensive Text in Multilingual Code Mixing in Social Media	Apr 1, 2021	ClassificationLanguage Identification	—Unverified
Huqariq: A Multilingual Speech Corpus of Native Languages of Peru for Speech Recognition	Jul 12, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Huqariq: A Multilingual Speech Corpus of Native Languages of Peru forSpeech Recognition	Jun 1, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Hypers@DravidianLangTech-EACL2021: Offensive language identification in Dravidian code-mixed YouTube Comments and Posts	Apr 1, 2021	Language Identification	—Unverified
iCompass at SemEval-2020 Task 12: From a Syntax-ignorant N-gram Embeddings Model to a Deep Bidirectional Language Model	Dec 1, 2020	Language IdentificationLanguage Modeling	—Unverified
Identification of Indian Languages using Ghost-VLAD pooling	Feb 5, 2020	Language Identification	—Unverified
Identification of Languages in Algerian Arabic Multilingual Documents	Apr 1, 2017	ChunkingGeneral Classification	—Unverified
Identification/Segmentation of Indian Regional Languages with Singular Value Decomposition based Feature Embedding	May 17, 2020	Language IdentificationSegmentation	—Unverified
Identifying Languages at the Word Level in Code-Mixed Indian Social Media Text	Dec 1, 2014	Language Identification	—Unverified
IIITG-ADBU at SemEval-2020 Task 12: Comparison of BERT and BiLSTM in Detecting Offensive Language	Dec 1, 2020	Language IdentificationWorld Knowledge	—Unverified
IIT (BHU) System for Indo-Aryan Language Identification (ILI) at VarDial 2018	Aug 1, 2018	Language IdentificationMachine Translation	—Unverified
IITP-AINLPML at SemEval-2020 Task 12: Offensive Tweet Identification and Target Categorization in a Multitask Environment	Dec 1, 2020	Language Identification	—Unverified
(Im)possibility of Automated Hallucination Detection in Large Language Models	Apr 23, 2025	HallucinationLanguage Identification	—Unverified
Improved Language Identification Through Cross-Lingual Self-Supervised Learning	Jul 8, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Improving Cuneiform Language Identification with BERT	Jun 1, 2019	Language Identification	—Unverified
Improving Informally Romanized Language Identification	Apr 30, 2025	Language Identification	—Unverified
Improving Language Identification for Multilingual Speakers	Jan 29, 2020	Language IdentificationSpoken language identification	—Unverified
Improving Language Identification of Accented Speech	Mar 31, 2022	Language Identificationspeech-recognition	—Unverified
Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking	Sep 27, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Improving Multilingual Speech Models on ML-SUPERB 2.0: Fine-tuning with Data Augmentation and LID-Aware CTC	May 30, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Improving Native Language Identification by Using Spelling Errors	Jul 1, 2017	Language IdentificationNative Language Identification	—Unverified
Improving Native Language Identification with TF-IDF Weighting	Jun 1, 2013	Language AcquisitionLanguage Identification	—Unverified
Improving the accuracy of pronunciation lexicon using Naive Bayes classifier with character n-gram as feature: for language classified pronunciation lexicon generation	Dec 1, 2014	Language Identification	—Unverified
Improving the Character Ngram Model for the DSL Task with BM25 Weighting and Less Frequently Used Feature Sets	Apr 1, 2017	Dialect IdentificationLanguage Identification	—Unverified
Improving the exploitation of linguistic annotations in ELAN	May 1, 2014	Language Identification	—Unverified
Improving the results of string kernels in sentiment analysis and Arabic dialect identification by adapting them to your test set	Aug 25, 2018	Dialect IdentificationGeneral Classification	—Unverified
Incorporating Dialectal Variability for Socially Equitable Language Identification	Jul 1, 2017	DiversityLanguage Identification	—Unverified
Incremental N-gram Approach for Language Identification in Code-Switched Text	Oct 1, 2014	Language Identification	—Unverified
Indonesian-English Code-Switching Speech Synthesizer Utilizing Multilingual STEN-TTS and Bert LID	Dec 26, 2024	Language Identificationtext-to-speech	—Unverified
Influence of Mother Tongue on English Accent	Dec 1, 2014	Language IdentificationSpeaker Recognition	—Unverified

Show:10 25 50

← PrevPage 9 of 16Next →

All datasets VOXLINGUA107 GlotLID-C Nordic Language Identification OpenSubtitles Universal Dependencies VoxForge

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	wav2vec 2.0 LV-60K	Error rate	7.2	—	Unverified
2	XLS-R	Error rate	5.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GlotLID	Macro F1	0.98	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	FastText	Accuracy	0.97	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Apple bi-LSTM	Accuracy	91.37	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Apple bi-LSTM	Accuracy	86.93	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ConformerG-P	Accuracy	99.8	—	Unverified