Language Identification

Language identification is the task of determining the language of a text.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–150 of 794 papers

Title	Date	Tasks	Status
Offensive Language Identification in Transliterated and Code-Mixed Bangla	Nov 25, 2023	Language Identification	—Unverified
The Obscure Limitation of Modular Multilingual Language Models	Nov 21, 2023	Language Identification	—Unverified
Fumbling in Babel: An Investigation into ChatGPT's Language Identification Ability	Nov 16, 2023	Language Identification	—Unverified
OffMix-3L: A Novel Code-Mixed Dataset in Bangla-English-Hindi for Offensive Language Identification	Oct 27, 2023	Language Identification	CodeCode Available
Advanced accent/dialect identification and accentedness assessment with multi-embedding models and automatic speech recognition	Oct 17, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond	Oct 9, 2023	Language Identificationspeech-recognition	—Unverified
Wavelet Scattering Transform for Improving Generalization in Low-Resourced Spoken Language Identification	Oct 1, 2023	Language IdentificationSpoken language identification	—Unverified
Multimodal Modeling For Spoken Language Identification	Sep 19, 2023	Language IdentificationSpoken language identification	—Unverified
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages	Sep 17, 2023	HallucinationLanguage Identification	—Unverified
Native Language Identification with Big Bird Embeddings	Sep 13, 2023	Computational EfficiencyFeature Engineering	CodeCode Available
Robust Open-Set Spoken Language Identification and the CU MultiLang Dataset	Aug 29, 2023	Language IdentificationSpoken language identification	—Unverified
Fine-Tuning Llama 2 Large Language Models for Detecting Online Sexual Predatory Chats and Abusive Texts	Aug 28, 2023	Abusive LanguageFake News Detection	—Unverified
Bilingual Streaming ASR with Grapheme units and Auxiliary Monolingual Loss	Aug 11, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Turkish Native Language Identification	Jul 27, 2023	Language IdentificationNative Language Identification	—Unverified
MASR: Multi-label Aware Speech Representation	Jul 20, 2023	Emotion RecognitionLanguage Identification	—Unverified
Multilingual Speech-to-Speech Translation into Multiple Target Languages	Jul 17, 2023	Language IdentificationSpeech-to-Speech Translation	—Unverified
Towards spoken dialect identification of Irish	Jul 14, 2023	Dialect IdentificationLanguage Identification	—Unverified
Confidence-based Ensembles of End-to-End Speech Recognition Models	Jun 27, 2023	Language IdentificationModel Selection	—Unverified
My Boli: Code-mixed Marathi-English Corpora, Pretrained Language Models and Evaluation Benchmarks	Jun 24, 2023	BenchmarkingHate Speech Detection	CodeCode Available
Unified model for code-switching speech recognition and language identification based on a concatenated tokenizer	Jun 14, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available
RoBERTweet: A BERT Language Model for Romanian Tweets	Jun 11, 2023	Language IdentificationLanguage Modeling	—Unverified
Leveraging Language Identification to Enhance Code-Mixed Text Classification	Jun 8, 2023	ClassificationHate Speech Detection	—Unverified
Label Aware Speech Representation Learning For Language Identification	Jun 7, 2023	Language IdentificationMissing Labels	—Unverified
Spoken Language Identification System for English-Mandarin Code-Switching Child-Directed Speech	Jun 1, 2023	DecoderLanguage Identification	CodeCode Available
Simple yet Effective Code-Switching Language Identification with Multitask Pre-Training and Transfer Learning	May 31, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
MERLIon CCS Challenge Evaluation Plan	May 31, 2023	Language IdentificationTask 2	CodeCode Available
Investigating model performance in language identification: beyond simple error statistics	May 30, 2023	Language Identification	CodeCode Available
MERLIon CCS Challenge: A English-Mandarin code-switching child-directed speech corpus for language identification and diarization	May 30, 2023	Language Identification	CodeCode Available
Script Normalization for Unconventional Writing of Under-Resourced Languages in Bilingual Communities	May 25, 2023	Language IdentificationMachine Translation	CodeCode Available
LIMIT: Language Identification, Misidentification, and Translation using Hierarchical Models in 350+ Languages	May 23, 2023	Language IdentificationTranslation	CodeCode Available
Multilingual Large Language Models Are Not (Yet) Code-Switchers	May 23, 2023	BenchmarkingLanguage Identification	—Unverified
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark	May 18, 2023	Automatic Speech RecognitionLanguage Identification	—Unverified
DocLangID: Improving Few-Shot Training to Identify the Language of Historical Documents	May 3, 2023	Few-Shot LearningLanguage Identification	CodeCode Available
Lessons Learned in ATCO2: 5000 hours of Air Traffic Control Communications for Robust Automatic Speech Recognition and Understanding	May 2, 2023	Automatic Speech RecognitionLanguage Identification	—Unverified
Approaches to Corpus Creation for Low-Resource Language Technology: the Case of Southern Kurdish and Laki	Apr 3, 2023	Language Identification	CodeCode Available
MMT: A Multilingual and Multi-Topic Indian Social Media Dataset	Apr 2, 2023	DiversityLanguage Identification	—Unverified
Joint unsupervised and supervised learning for context-aware language identification	Mar 29, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Language Variety Identification with True Labels	Mar 2, 2023	Language Identification	CodeCode Available
Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training	Mar 1, 2023	Language Identification	—Unverified
Augmented Transformers with Adaptive n-grams Embedding for Multilingual Scene Text Recognition	Feb 28, 2023	Language IdentificationScene Text Recognition	—Unverified
Language identification as improvement for lip-based biometric visual systems	Feb 27, 2023	Language Identification	—Unverified
Cross-Corpora Spoken Language Identification with Domain Diversification and Generalization	Feb 10, 2023	Data AugmentationDomain Generalization	—Unverified
A Twitter BERT Approach for Offensive Language Detection in Marathi	Dec 20, 2022	Data AugmentationLanguage Identification	—Unverified
An Overview of Indian Spoken Language Recognition from Machine Learning Perspective	Nov 30, 2022	Language IdentificationSpoken language identification	—Unverified
Transformer-based Model for Word Level Language Identification in Code-mixed Kannada-English Texts	Nov 26, 2022	Language Identification	—Unverified
Predicting the Type and Target of Offensive Social Media Posts in Marathi	Nov 22, 2022	Language Identification	CodeCode Available
Scaling Native Language Identification with Transformer Adapters	Nov 18, 2022	Language IdentificationMarketing	—Unverified
Overview of the HASOC Subtrack at FIRE 2022: Offensive Language Identification in Marathi	Nov 18, 2022	Language Identification	—Unverified
CoLI-Machine Learning Approaches for Code-mixed Language Identification at the Word Level in Kannada-English Texts	Nov 17, 2022	Language IdentificationSentence	—Unverified
Accidental Learners: Spoken Language Identification in Multilingual Self-Supervised Models	Nov 9, 2022	Language IdentificationSpoken language identification	—Unverified

Show:10 25 50

← PrevPage 3 of 16Next →

All datasets VOXLINGUA107 GlotLID-C Nordic Language Identification OpenSubtitles Universal Dependencies VoxForge

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	wav2vec 2.0 LV-60K	Error rate	7.2	—	Unverified
2	XLS-R	Error rate	5.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GlotLID	Macro F1	0.98	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	FastText	Accuracy	0.97	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Apple bi-LSTM	Accuracy	91.37	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Apple bi-LSTM	Accuracy	86.93	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ConformerG-P	Accuracy	99.8	—	Unverified