Language Identification

Language identification is the task of determining the language of a text.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–100 of 794 papers

Title	Date	Tasks	Status
NusaAksara: A Multimodal and Multilingual Benchmark for Preserving Indonesian Indigenous Scripts	Feb 25, 2025	Image SegmentationLanguage Identification	—Unverified
English Please: Evaluating Machine Translation with Large Language Models for Multilingual Bug Reports	Feb 20, 2025	Domain AdaptationLanguage Identification	CodeCode Available
Multi-label Scandinavian Language Identification (SLIDE)	Feb 10, 2025	Language IdentificationSentence	CodeCode Available
On the use of Performer and Agent Attention for Spoken Language Identification	Feb 9, 2025	Language IdentificationSelf-Supervised Learning	—Unverified
Evaluating Standard and Dialectal Frisian ASR: Multilingual Fine-tuning and Language Identification for Improved Low-resource Performance	Feb 7, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Is It Navajo? Accurate Language Detection in Endangered Athabaskan Languages	Jan 27, 2025	DiversityLanguage Identification	CodeCode Available
Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language Understanding	Jan 10, 2025	Automatic Speech RecognitionClassification	CodeCode Available
Indonesian-English Code-Switching Speech Synthesizer Utilizing Multilingual STEN-TTS and Bert LID	Dec 26, 2024	Language Identificationtext-to-speech	—Unverified
Enhancing Code-Switching ASR Leveraging Non-Peaky CTC Loss and Deep Language Posterior Injection	Nov 26, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Exploring Facets of Language Generation in the Limit	Nov 22, 2024	Language IdentificationText Generation	—Unverified
Can adversarial attacks by large language models be attributed?	Nov 12, 2024	AttributeLanguage Identification	—Unverified
Prompt Engineering Using GPT for Word-Level Code-Mixed Language Identification in Low-Resource Dravidian Languages	Nov 6, 2024	Information RetrievalLanguage Identification	—Unverified
Computational Approaches to Arabic-English Code-Switching	Oct 17, 2024	Data AugmentationLanguage Identification	—Unverified
Generation through the lens of learning theory	Oct 17, 2024	Language IdentificationLearning Theory	—Unverified
A Multi-Task Text Classification Pipeline with Natural Language Explanations: A User-Centric Evaluation in Sentiment Analysis and Offensive Language Identification in Greek Tweets	Oct 14, 2024	Feature ImportanceLanguage Identification	—Unverified
From N-grams to Pre-trained Multilingual Models For Language Identification	Oct 11, 2024	Language IdentificationXLM-R	CodeCode Available
Efficiently Identifying Low-Quality Language Subsets in Multilingual Datasets: A Case Study on a Large-Scale Multilingual Audio Dataset	Oct 5, 2024	Language Identification	—Unverified
AfriHuBERT: A self-supervised speech representation model for African languages	Sep 30, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available
Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking	Sep 27, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available
Leveraging Open-Source Large Language Models for Native Language Identification	Sep 15, 2024	Feature EngineeringLanguage Acquisition	—Unverified
Enhancing Code-Switching Speech Recognition with LID-Based Collaborative Mixture of Experts Model	Sep 3, 2024	Language IdentificationMixture-of-Experts	—Unverified
Literary and Colloquial Dialect Identification for Tamil using Acoustic Features	Aug 27, 2024	Automatic Speech RecognitionDialect Identification	—Unverified
Towards Generalized Offensive Language Identification	Jul 26, 2024	Language Identification	—Unverified
A Recipe of Parallel Corpora Exploitation for Multilingual Large Language Models	Jun 29, 2024	Language IdentificationMachine Translation	—Unverified
SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR	Jun 26, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Script-Agnostic Language Identification	Jun 25, 2024	Language Identification	CodeCode Available
Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting	Jun 18, 2024	DecoderLanguage Identification	—Unverified
Exploring Spoken Language Identification Strategies for Automatic Transcription of Multilingual Broadcast and Institutional Speech	Jun 13, 2024	Language Identificationspeaker-diarization	—Unverified
ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets	Jun 12, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation	Jun 12, 2024	Language Identification	—Unverified
Malayalam Sign Language Identification using Finetuned YOLOv8 and Computer Vision Techniques	May 8, 2024	Language Identification	—Unverified
Whispy: Adapting STT Whisper Models to Real-Time Environments	May 6, 2024	Action DetectionActivity Detection	—Unverified
A Federated Learning Approach to Privacy Preserving Offensive Language Identification	Apr 17, 2024	Federated LearningLanguage Identification	—Unverified
What is Learnt by the LEArnable Front-end (LEAF)? Adapting Per-Channel Energy Normalisation (PCEN) to Noisy Conditions	Apr 10, 2024	Emotion RecognitionKeyword Spotting	CodeCode Available
Geographically-Informed Language Identification	Mar 14, 2024	Language Identification	CodeCode Available
More than words: Advancements and challenges in speech recognition for singing	Mar 14, 2024	Keyword SpottingLanguage Identification	—Unverified
Validating and Exploring Large Geographic Corpora	Mar 13, 2024	Language IdentificationOutlier Detection	—Unverified
Aligning Speech to Languages to Enhance Code-switching Speech Recognition	Mar 9, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification	Feb 20, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available
Code-Switched Language Identification is Harder Than You Think	Feb 2, 2024	Language IdentificationSentence	CodeCode Available
Detecting Structured Language Alternations in Historical Documents by Combining Language Identification with Fourier Analysis	Jan 25, 2024	Language Identification	—Unverified
Acoustic characterization of speech rhythm: going beyond metrics with recurrent neural networks	Jan 22, 2024	Language IdentificationRhythm	—Unverified
Language Detection for Transliterated Content	Jan 9, 2024	Language IdentificationTransliteration	—Unverified
Generative linguistic representation for spoken language identification	Dec 18, 2023	DecoderLanguage Identification	—Unverified
Cross-Linguistic Offensive Language Detection: BERT-Based Analysis of Bengali, Assamese, & Bodo Conversational Hateful Content from Social Media	Dec 16, 2023	Language Identification	—Unverified
Leveraging Language ID to Calculate Intermediate CTC Loss for Enhanced Code-Switching Speech Recognition	Dec 15, 2023	Automatic Speech RecognitionLanguage Identification	—Unverified
Attention-Guided Adaptation for Code-Switching Speech Recognition	Dec 14, 2023	Language Identificationspeech-recognition	—Unverified
Native Language Identification with Large Language Models	Dec 13, 2023	Language AcquisitionLanguage Identification	—Unverified
Self-supervised Adaptive Pre-training of Multilingual Speech Models for Language and Dialect Identification	Dec 12, 2023	Automatic Speech RecognitionDialect Identification	—Unverified
A Text-to-Text Model for Multilingual Offensive Language Identification	Dec 6, 2023	DecoderLanguage Identification	—Unverified

Show:10 25 50

← PrevPage 2 of 16Next →

All datasets VOXLINGUA107 GlotLID-C Nordic Language Identification OpenSubtitles Universal Dependencies VoxForge

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	wav2vec 2.0 LV-60K	Error rate	7.2	—	Unverified
2	XLS-R	Error rate	5.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GlotLID	Macro F1	0.98	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	FastText	Accuracy	0.97	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Apple bi-LSTM	Accuracy	91.37	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Apple bi-LSTM	Accuracy	86.93	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ConformerG-P	Accuracy	99.8	—	Unverified