Language Identification

Language identification is the task of determining the language of a text.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 201–250 of 794 papers

Title	Date	Tasks	Status
DeepAnalyzer at SemEval-2019 Task 6: A deep learning-based ensemble method for identifying offensive tweets	Jun 1, 2019	Language IdentificationPart-Of-Speech Tagging	—Unverified
Deep learning-based end-to-end spoken language identification system for domain-mismatched scenario	Jun 1, 2022	Language IdentificationSpeaker Verification	—Unverified
CUSATNLP@HASOC-Dravidian-CodeMix-FIRE2020:Identifying Offensive Language from ManglishTweets	Oct 17, 2020	Information RetrievalLanguage Identification	—Unverified
DELab@IIITSM at ICON-2021 Shared Task: Identification of Aggression and Biasness Using Decision Tree	Dec 1, 2021	Language Identification	—Unverified
CUSATNLP@DravidianLangTech-EACL2021:Language Agnostic Classification of Offensive Content in Tweets	Apr 1, 2021	Language IdentificationPosition	—Unverified
Detecting Code-Switching in a Multilingual Alpine Heritage Corpus	Oct 1, 2014	Language IdentificationNamed Entity Recognition (NER)	—Unverified
Automatic Identification of Learners' Language Background Based on Their Writing in Czech	Oct 1, 2013	Language AcquisitionLanguage Identification	—Unverified
Detection of Similar Languages and Dialects Using Deep Supervised Autoencoder	Dec 1, 2020	Language Identification	—Unverified
Detect Language of Transliterated Texts	Apr 26, 2020	Language IdentificationTranslation	—Unverified
Developing Language-tagged Corpora for Code-switching Tweets	Jun 1, 2015	Language Identification	—Unverified
Curriculum Design for Code-switching: Experiments with Language Identification and Language Modeling with Deep Neural Networks	Dec 1, 2017	Language IdentificationLanguage Modeling	—Unverified
Development of Text and Speech database for Hindi and Indian English specific to Mobile Communication environment	May 1, 2012	Language IdentificationSpeech Recognition	—Unverified
Dialect Diversity in Text Summarization on Twitter	Jul 15, 2020	AttributeDiversity	—Unverified
Dialects Identification of Armenian Language	Jun 1, 2022	Dialect IdentificationLanguage Identification	—Unverified
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages	Sep 17, 2023	HallucinationLanguage Identification	—Unverified
Discriminating between Indo-Aryan Languages Using SVM Ensembles	Jul 9, 2018	Language Identification	—Unverified
Discriminating between Mandarin Chinese and Swiss-German varieties using adaptive language models	Jun 1, 2019	Dialect IdentificationLanguage Identification	—Unverified
Discriminating between Similar Languages Using PPM	Sep 1, 2015	Language Identification	—Unverified
Automatic Identification of Closely-related Indian Languages: Resources and Experiments	Mar 26, 2018	Language Identification	—Unverified
Discriminating between Similar Languages and Arabic Dialect Identification: A Report on the Third DSL Shared Task	Dec 1, 2016	Dialect IdentificationGeneral Classification	—Unverified
Discriminating between Similar Languages on Imbalanced Conversational Texts	May 1, 2018	Language Identification	—Unverified
Discriminating between Similar Languages with Word-level Convolutional Neural Networks	Apr 1, 2017	Language IdentificationQuestion Answering	—Unverified
cs@DravidianLangTech-EACL2021: Offensive Language Identification Based On Multilingual BERT Model	Apr 1, 2021	Language Identificationtext-classification	—Unverified
Discriminating Non-Native English with 350 Words	Jun 1, 2013	Language AcquisitionLanguage Identification	—Unverified
Discriminating Similar Languages with Linear SVMs and Neural Networks	Dec 1, 2016	Deep LearningLanguage Identification	—Unverified
Discriminating Similar Languages with Token-Based Backoff	Sep 1, 2015	Language Identification	—Unverified
Cross-Linguistic Offensive Language Detection: BERT-Based Analysis of Bengali, Assamese, & Bodo Conversational Hateful Content from Social Media	Dec 16, 2023	Language Identification	—Unverified
Automatic discovery of Latin syntactic changes	Aug 1, 2016	Language Identification	—Unverified
Distinguishing Literal and Non-Literal Usage of German Particle Verbs	Jun 1, 2016	General ClassificationLanguage Identification	—Unverified
Distributed Representations of Words and Documents for Discriminating Similar Languages	Sep 1, 2015	Language IdentificationMeta-Learning	—Unverified
Distributional Interaction of Concreteness and Abstractness in Verb--Noun Subcategorisation	May 1, 2019	Language IdentificationObject	—Unverified
DKPro TC: A Java-based Framework for Supervised Learning Experiments on Textual Data	Jun 1, 2014	Language IdentificationPart-Of-Speech Tagging	—Unverified
DLRG@DravidianLangTech-EACL2021: Transformer based approachfor Offensive Language Identification on Code-Mixed Tamil	Apr 1, 2021	Language IdentificationLanguage Modeling	—Unverified
Do Characters Abuse More Than Words?	Sep 1, 2016	Hate Speech DetectionLanguage Identification	—Unverified
Anglicized Words and Misspelled Cognates in Native Language Identification	Aug 1, 2019	Language IdentificationNative Language Identification	—Unverified
A Federated Learning Approach to Privacy Preserving Offensive Language Identification	Apr 17, 2024	Federated LearningLanguage Identification	—Unverified
A Dataset and Classifier for Recognizing Social Media English	Sep 1, 2017	Language IdentificationLanguage Modeling	—Unverified
Accurate Pinyin-English Codeswitched Language Identification	Nov 1, 2016	Language Identification	—Unverified
Cross-lingual Inductive Transfer to Detect Offensive Language	Jul 7, 2020	Language IdentificationPosition	—Unverified
Duluth at SemEval-2020 Task 12: Offensive Tweet Identification in English with Logistic Regression	Jul 25, 2020	Language Identificationregression	—Unverified
Dyn-ASR: Compact, Multilingual Speech Recognition via Spoken Language and Accent Identification	Aug 4, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Efficient Discrimination Between Closely Related Languages	Dec 1, 2012	Document ClassificationLanguage Identification	—Unverified
Efficiently Identifying Low-Quality Language Subsets in Multilingual Datasets: A Case Study on a Large-Scale Multilingual Audio Dataset	Oct 5, 2024	Language Identification	—Unverified
Emad at SemEval-2019 Task 6: Offensive Language Identification using Traditional Machine Learning and Deep Learning approaches	Jun 1, 2019	Data AugmentationLanguage Identification	—Unverified
Cross-domain Feature Selection for Language Identification	Nov 11, 2011	feature selectionLanguage Identification	—Unverified
Automatic Detection of Sentence Fragments	Jul 1, 2015	Grammatical Error CorrectionLanguage Identification	—Unverified
An Exploratory Analysis of the Relation Between Offensive Language and Mental Health	May 31, 2021	Depression DetectionLanguage Identification	—Unverified
Cross-corpus Native Language Identification via Statistical Embedding	Jun 1, 2018	Cross-corpusLanguage Identification	—Unverified
Enhancing Code-Switching ASR Leveraging Non-Peaky CTC Loss and Deep Language Posterior Injection	Nov 26, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Cross-Corpora Spoken Language Identification with Domain Diversification and Generalization	Feb 10, 2023	Data AugmentationDomain Generalization	—Unverified

Show:10 25 50

← PrevPage 5 of 16Next →

All datasets VOXLINGUA107 GlotLID-C Nordic Language Identification OpenSubtitles Universal Dependencies VoxForge

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	wav2vec 2.0 LV-60K	Error rate	7.2	—	Unverified
2	XLS-R	Error rate	5.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GlotLID	Macro F1	0.98	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	FastText	Accuracy	0.97	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Apple bi-LSTM	Accuracy	91.37	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Apple bi-LSTM	Accuracy	86.93	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ConformerG-P	Accuracy	99.8	—	Unverified