Language Identification

Language identification is the task of determining the language of a text.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–150 of 794 papers

Title	Date	Tasks	Status	Score
AfriHuBERT: A self-supervised speech representation model for African languages	Sep 30, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	5
Automatic Language Identification in Texts: A Survey	Apr 22, 2018	Language IdentificationSurvey	CodeCode Available	5
IIITK@DravidianLangTech-EACL2021: Offensive Language Identification and Meme Classification in Tamil, Malayalam and Kannada	Apr 17, 2021	ClassificationLanguage Identification	CodeCode Available	5
Script-Agnostic Language Identification	Jun 25, 2024	Language Identification	CodeCode Available	5
Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking	Sep 27, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	5
SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval)	Mar 19, 2019	Language Identification	CodeCode Available	5
Aggressive Language Identification Using Word Embeddings and Sentiment Features	Aug 1, 2018	Aggression IdentificationBIG-bench Machine Learning	CodeCode Available	5
SJ_AJ@DravidianLangTech-EACL2021: Task-Adaptive Pre-Training of Multilingual BERT models for Offensive Language Identification	Feb 1, 2021	Language IdentificationLanguage Modeling	CodeCode Available	5
Hate-Alert@DravidianLangTech-EACL2021: Ensembling strategies for Transformer-based Offensive language Detection	Feb 19, 2021	Language IdentificationPosition	CodeCode Available	5
Ghmerti at SemEval-2019 Task 6: A Deep Word- and Character-based Approach to Offensive Language Identification	Sep 22, 2020	Language Identification	CodeCode Available	5
HeLI, a Word-Based Backoff Method for Language Identification	Dec 1, 2016	Language IdentificationPosition	CodeCode Available	5
TAC at SemEval-2020 Task 12: Ensembling Approach for Multilingual Offensive Language Identification in Social Media	Dec 1, 2020	BIG-bench Machine LearningDeep Learning	CodeCode Available	5
Joint UD Parsing of Norwegian Bokm and Nynorsk	May 1, 2017	Language IdentificationMachine Translation	CodeCode Available	5
GeezSwitch: Language Identification in Typologically Related Low-resourced East African Languages	Jun 1, 2022	Language IdentificationMachine Translation	CodeCode Available	5
Towards Ethical Content-Based Detection of Online Influence Campaigns	Aug 29, 2019	Language IdentificationNative Language Identification	CodeCode Available	5
Towards Offensive Language Identification for Dravidian Languages	Apr 1, 2021	Few-Shot LearningLanguage Identification	CodeCode Available	5
Finding Structure in Text, Genome and Other Symbolic Sequences	Jul 8, 2012	Information RetrievalLanguage Identification	CodeCode Available	5
Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language Understanding	Jan 10, 2025	Automatic Speech RecognitionClassification	CodeCode Available	5
English Please: Evaluating Machine Translation with Large Language Models for Multilingual Bug Reports	Feb 20, 2025	Domain AdaptationLanguage Identification	CodeCode Available	5
End-to-end Language Identification using NetFV and NetVLAD	Sep 9, 2018	Language Identification	CodeCode Available	5
FBK-DH at SemEval-2020 Task 12: Using Multi-channel BERT for Multilingual Offensive Language Detection	Dec 1, 2020	Language IdentificationMachine Translation	CodeCode Available	5
Using Language Learner Data for Metaphor Detection	Jun 1, 2018	Language IdentificationWord Embeddings	CodeCode Available	5
Geographic Adaptation of Pretrained Language Models	Mar 16, 2022	Language IdentificationLanguage Modeling	CodeCode Available	5
Distilled Non-Semantic Speech Embeddings with Binary Neural Networks for Low-Resource Devices	Jul 12, 2022	Emotion RecognitionKeyword Spotting	CodeCode Available	5
CyberTronics at SemEval-2020 Task 12: Multilingual Offensive Language Identification over Social Media	Dec 1, 2020	Feature EngineeringLanguage Identification	CodeCode Available	5
DocLangID: Improving Few-Shot Training to Identify the Language of Historical Documents	May 3, 2023	Few-Shot LearningLanguage Identification	CodeCode Available	5
Comparing the Performance of CNNs and Shallow Models for Language Identification	Apr 1, 2021	Dialect IdentificationLanguage Identification	CodeCode Available	5
Code-Switched Language Identification is Harder Than You Think	Feb 2, 2024	Language IdentificationSentence	CodeCode Available	5
AdelaideCyC at SemEval-2020 Task 12: Ensemble of Classifiers for Offensive Language Detection in Social Media	Dec 1, 2020	Language Identification	CodeCode Available	5
Combination of multiple Deep Learning architectures for Offensive Language Detection in Tweets	Mar 16, 2019	Language Identification	CodeCode Available	5
Cross-Domain Adaptation of Spoken Language Identification for Related Languages: The Curious Case of Slavic Languages	Aug 2, 2020	DiversityDomain Adaptation	CodeCode Available	5
Cross-lingual Offensive Language Identification for Low Resource Languages: The Case of Marathi	Sep 8, 2021	Language IdentificationTransfer Learning	CodeCode Available	5
Discriminating between Similar Languages using Weighted Subword Features	Apr 1, 2017	Language IdentificationText Categorization	CodeCode Available	5
Discriminating Between Similar Nordic Languages	Dec 11, 2020	BIG-bench Machine LearningLanguage Identification	CodeCode Available	5
Crawling microblogging services to gather language-classified URLs. Workflow and case study	Aug 1, 2013	Language Identification	CodeCode Available	5
Embeddia at SemEval-2019 Task 6: Detecting Hate with Neural Network and Transfer Learning Approaches	Jun 1, 2019	Language IdentificationTransfer Learning	CodeCode Available	5
DOSA: Dravidian Code-Mixed Offensive Span Identification Dataset	Apr 1, 2021	Language Identification	CodeCode Available	5
Enhance Language Identification using Dual-mode Model with Knowledge Distillation	Mar 7, 2022	Knowledge DistillationLanguage Identification	CodeCode Available	5
Geographically-Informed Language Identification	Mar 14, 2024	Language Identification	CodeCode Available	5
FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech	May 25, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	5
From English to Code-Switching: Transfer Learning with Strong Morphological Clues	Sep 11, 2019	Language IdentificationNamed Entity Recognition (NER)	CodeCode Available	5
From N-grams to Pre-trained Multilingual Models For Language Identification	Oct 11, 2024	Language IdentificationXLM-R	CodeCode Available	5
JU\_ETCE\_17\_21 at SemEval-2019 Task 6: Efficient Machine Learning and Neural Network Approaches for Identifying and Categorizing Offensive Language in Tweets	Jun 1, 2019	Language IdentificationWord Embeddings	CodeCode Available	5
On the End-to-End Solution to Mandarin-English Code-switching Speech Recognition	Nov 1, 2018	Data AugmentationLanguage Identification	CodeCode Available	5
Building a TOCFL Learner Corpus for Chinese Grammatical Error Diagnosis	May 1, 2018	Grammatical Error DetectionLanguage Acquisition	—Unverified	0
Building a learner corpus for Russian	Nov 1, 2016	Language AcquisitionLanguage Identification	—Unverified	0
Arabic Native Language Identification	Oct 1, 2014	Language AcquisitionLanguage Identification	—Unverified	0
bs,hr,srWaC - Web Corpora of Bosnian, Croatian and Serbian	Apr 1, 2014	Language IdentificationLanguage Modelling	—Unverified	0
BRUMS at SemEval-2020 Task 12: Transformer Based Multilingual Offensive Language Identification in Social Media	Dec 1, 2020	Language Identification	—Unverified	0
Arabic Language WEKA-Based Dialect Classifier for Arabic Automatic Speech Recognition Transcripts	Dec 1, 2016	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0

Show:10 25 50

← PrevPage 3 of 16Next →

All datasets VOXLINGUA107 GlotLID-C Nordic Language Identification OpenSubtitles Universal Dependencies VoxForge

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	wav2vec 2.0 LV-60K	Error rate	7.2	—	Unverified
2	XLS-R	Error rate	5.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GlotLID	Macro F1	0.98	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	FastText	Accuracy	0.97	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Apple bi-LSTM	Accuracy	91.37	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Apple bi-LSTM	Accuracy	86.93	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ConformerG-P	Accuracy	99.8	—	Unverified