Language Identification

Language identification is the task of determining the language of a text.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 301–350 of 794 papers

Title	Date	Tasks	Status
Cognate and Misspelling Features for Natural Language Identification	Jun 1, 2013	Language Identification	—Unverified
Codewithzichao@DravidianLangTech-EACL2021: Exploring Multilingual Transformers for Offensive Language Identification on Code Mixing Text	Apr 1, 2021	Language Identification	—Unverified
A survey on phrase structure learning methods for text classification	Jun 21, 2014	ClassificationGeneral Classification	—Unverified
Code-Switching Ubique Est - Language Identification and Part-of-Speech Tagging for Historical Mixed Text	Aug 1, 2016	Language IdentificationPart-Of-Speech Tagging	—Unverified
Gender Prediction in English-Hindi Code-Mixed Social Media Content : Corpus and Baseline System	Jun 14, 2018	Author ProfilingGender Prediction	—Unverified
Codeswitching language identification using Subword Information Enriched Word Vectors	Nov 1, 2016	Language IdentificationNamed Entity Recognition (NER)	—Unverified
A Study on Spoken Language Identification using Deep Neural Networks	Sep 15, 2020	Language IdentificationSpoken language identification	—Unverified
American Sign Language Identification Using Hand Trackpoint Analysis	Oct 20, 2020	BIG-bench Machine LearningLanguage Identification	—Unverified
Garain at SemEval-2020 Task 12: Sequence based Deep Learning for Categorizing Offensive Language in Social Media	Sep 2, 2020	Language Identification	—Unverified
Galileo at SemEval-2020 Task 12: Multi-lingual Learning for Offensive Language Identification using Pre-trained Language Models	Oct 7, 2020	AllKnowledge Distillation	—Unverified
Fusion of Simple Models for Native Language Identification	Sep 1, 2017	Information RetrievalLanguage Identification	—Unverified
Generation through the lens of learning theory	Oct 17, 2024	Language IdentificationLearning Theory	—Unverified
Code-Switched Named Entity Recognition with Embedding Attention	Jul 1, 2018	Language Identificationnamed-entity-recognition	—Unverified
Generative linguistic representation for spoken language identification	Dec 18, 2023	DecoderLanguage Identification	—Unverified
Fumbling in Babel: An Investigation into ChatGPT's Language Identification Ability	Nov 16, 2023	Language Identification	—Unverified
Fully Connected Neural Network with Advance Preprocessor to Identify Aggression over Facebook and Twitter	Aug 1, 2018	Aggression IdentificationHate Speech Detection	—Unverified
Code Switched and Code Mixed Speech Recognition for Indic languages	Mar 30, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
GlobalPhone: Pronunciation Dictionaries in 20 Languages	May 1, 2014	Language IdentificationLanguage Modelling	—Unverified
From Visualisation to Hypothesis Construction for Second Language Acquisition	Oct 1, 2014	Language AcquisitionLanguage Identification	—Unverified
Code Mixing: A Challenge for Language Identification in the Language of Social Media	Oct 1, 2014	Language Identification	—Unverified
GLUECoS : An Evaluation Benchmark for Code-Switched NLP	Apr 26, 2020	Language Identificationnamed-entity-recognition	—Unverified
GLUECoS: An Evaluation Benchmark for Code-Switched NLP	Jul 1, 2020	Language Identificationnamed-entity-recognition	—Unverified
HAD-T\"ubingen at SemEval-2019 Task 6: Deep Learning Analysis of Offensive Language on Twitter: Identification and Categorization	Jun 1, 2019	Language Identification	—Unverified
HaT5: Hate Language Identification using Text-to-Text Transfer Transformer	Feb 11, 2022	Data AugmentationExplainable artificial intelligence	—Unverified
ASIREM Participation at the Discriminating Similar Languages Shared Task 2016	Dec 1, 2016	Dialect IdentificationLanguage Identification	—Unverified
A Compact End-to-End Model with Local and Global Context for Spoken Language Identification	Oct 27, 2022	Language IdentificationSpoken language identification	—Unverified
HeLI-based Experiments in Discriminating Between Dutch and Flemish Subtitles	Aug 1, 2018	ClusteringLanguage Identification	—Unverified
HeLI-based Experiments in Swiss German Dialect Identification	Aug 1, 2018	Dialect IdentificationLanguage Identification	—Unverified
Advancing Linguistic Features and Insights by Label-informed Feature Grouping: An Exploration in the Context of Native Language Identification	Dec 1, 2016	ClusteringLanguage Acquisition	—Unverified
HHU at SemEval-2019 Task 6: Context Does Matter - Tackling Offensive Language Identification and Categorization with ELMo	Jun 1, 2019	Language Identification	—Unverified
From Language to Family and Back: Native Language and Language Family Identification from English Text	Jun 1, 2013	Language Identification	—Unverified
Hindi-English Code-Switching Speech Corpus	Sep 24, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Hitachi at SemEval-2020 Task 12: Offensive Language Identification with Noisy Labels using Statistical Sampling and Post-Processing	May 1, 2020	Language IdentificationPosition	—Unverified
HUB@DravidianLangTech-EACL2021: Identify and Classify Offensive Text in Multilingual Code Mixing in Social Media	Apr 1, 2021	ClassificationLanguage Identification	—Unverified
Huqariq: A Multilingual Speech Corpus of Native Languages of Peru for Speech Recognition	Jul 12, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Huqariq: A Multilingual Speech Corpus of Native Languages of Peru forSpeech Recognition	Jun 1, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Hypers@DravidianLangTech-EACL2021: Offensive language identification in Dravidian code-mixed YouTube Comments and Posts	Apr 1, 2021	Language Identification	—Unverified
CN-HIT-MI.T at SemEval-2019 Task 6: Offensive Language Identification Based on BiLSTM with Double Attention	Jun 1, 2019	Language Identification	—Unverified
From `Solved Problems' to New Challenges: A Report on LDC Activities	May 1, 2018	Dialogue ManagementLanguage Identification	—Unverified
Identification of Indian Languages using Ghost-VLAD pooling	Feb 5, 2020	Language Identification	—Unverified
Identification of Languages in Algerian Arabic Multilingual Documents	Apr 1, 2017	ChunkingGeneral Classification	—Unverified
Identification/Segmentation of Indian Regional Languages with Singular Value Decomposition based Feature Embedding	May 17, 2020	Language IdentificationSegmentation	—Unverified
Identifying Languages at the Word Level in Code-Mixed Indian Social Media Text	Dec 1, 2014	Language Identification	—Unverified
IIITG-ADBU at SemEval-2020 Task 12: Comparison of BERT and BiLSTM in Detecting Offensive Language	Dec 1, 2020	Language IdentificationWorld Knowledge	—Unverified
Fluency detection on communication networks	Nov 1, 2016	Language IdentificationPart-Of-Speech Tagging	—Unverified
CLUZH at VarDial GDI 2017: Testing a Variety of Machine Learning Tools for the Classification of Swiss German Dialects	Apr 1, 2017	General ClassificationLanguage Identification	—Unverified
IIT (BHU) System for Indo-Aryan Language Identification (ILI) at VarDial 2018	Aug 1, 2018	Language IdentificationMachine Translation	—Unverified
IITP-AINLPML at SemEval-2020 Task 12: Offensive Tweet Identification and Target Categorization in a Multitask Environment	Dec 1, 2020	Language Identification	—Unverified
(Im)possibility of Automated Hallucination Detection in Large Language Models	Apr 23, 2025	HallucinationLanguage Identification	—Unverified
A Simple and Efficient Probabilistic Language model for Code-Mixed Text	Jun 29, 2021	Information RetrievalLanguage Identification	—Unverified

Show:10 25 50

← PrevPage 7 of 16Next →

All datasets VOXLINGUA107 GlotLID-C Nordic Language Identification OpenSubtitles Universal Dependencies VoxForge

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	wav2vec 2.0 LV-60K	Error rate	7.2	—	Unverified
2	XLS-R	Error rate	5.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GlotLID	Macro F1	0.98	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	FastText	Accuracy	0.97	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Apple bi-LSTM	Accuracy	91.37	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Apple bi-LSTM	Accuracy	86.93	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ConformerG-P	Accuracy	99.8	—	Unverified