Language Identification

Language identification is the task of determining the language of a text.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–200 of 794 papers

Title	Date	Tasks	Status
LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers	Nov 5, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
A Compact End-to-End Model with Local and Global Context for Spoken Language Identification	Oct 27, 2022	Language IdentificationSpoken language identification	—Unverified
Italian Language and Dialect Identification and Regional French Variety Detection using Adaptive Naive Bayes	Oct 1, 2022	Dialect IdentificationLanguage Identification	CodeCode Available
Neural Networks for Cross-domain Language Identification. Phlyers @Vardial 2022	Oct 1, 2022	Language Identification	—Unverified
OcWikiDisc: a Corpus of Wikipedia Talk Pages in Occitan	Oct 1, 2022	8kLanguage Identification	—Unverified
The Curious Case of Logistic Regression for Italian Languages and Dialects Identification	Oct 1, 2022	Language IdentificationMachine Translation	CodeCode Available
Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification	Sep 13, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Evaluation of Off-the-Shelf Language Identification Tools on Bulgarian Social Media Posts	Sep 1, 2022	Language Identification	—Unverified
Unravelling Interlanguage Facts via Explainable Machine Learning	Aug 2, 2022	BIG-bench Machine LearningLanguage Identification	—Unverified
Extending RNN-T-based speech recognition systems with emotion and language classification	Jul 28, 2022	Emotion ClassificationEmotion Recognition	—Unverified
Distilled Non-Semantic Speech Embeddings with Binary Neural Networks for Low-Resource Devices	Jul 12, 2022	Emotion RecognitionKeyword Spotting	CodeCode Available
Huqariq: A Multilingual Speech Corpus of Native Languages of Peru for Speech Recognition	Jul 12, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
TechSSN at SemEval-2022 Task 6: Intended Sarcasm Detection using Transformer Models	Jul 1, 2022	Language IdentificationSarcasm Detection	—Unverified
Language Identification for Austronesian Languages	Jun 9, 2022	Language Identification	CodeCode Available
HeLI-OTS, Off-the-shelf Language Identifier for Text	Jun 1, 2022	Language Identification	—Unverified
Huqariq: A Multilingual Speech Corpus of Native Languages of Peru forSpeech Recognition	Jun 1, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Deep learning-based end-to-end spoken language identification system for domain-mismatched scenario	Jun 1, 2022	Language IdentificationSpeaker Verification	—Unverified
CoSwID, a Code Switching Identification Method Suitable for Under-Resourced Languages	Jun 1, 2022	Language Identification	—Unverified
GeezSwitch: Language Identification in Typologically Related Low-resourced East African Languages	Jun 1, 2022	Language IdentificationMachine Translation	CodeCode Available
MHE: Code-Mixed Corpora for Similar Language Identification	Jun 1, 2022	Language IdentificationSentence	—Unverified
Universal Dependencies Treebank for Tatar: Incorporating Intra-Word Code-Switching Information	Jun 1, 2022	Language IdentificationPOS	—Unverified
Dialects Identification of Armenian Language	Jun 1, 2022	Dialect IdentificationLanguage Identification	—Unverified
Adversarial synthesis based data-augmentation for code-switched spoken language identification	May 30, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech	May 25, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available
Modernizing Open-Set Speech Language Identification	May 20, 2022	Language IdentificationSpeech Language Identification	—Unverified
Automatic Spoken Language Identification using a Time-Delay Neural Network	May 19, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Pretraining Approaches for Spoken Language Recognition: TalTech Submission to the OLR 2021 Challenge	May 14, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Building Machine Translation Systems for the Next Thousand Languages	May 9, 2022	Language IdentificationMachine Translation	—Unverified
TuGeBiC: A Turkish German Bilingual Code-Switching Corpus	May 2, 2022	Language Identification	—Unverified
Unsupervised Preference-Aware Language Identification	May 1, 2022	Language Identification	CodeCode Available
Findings of the Shared Task on Multi-task Learning in Dravidian Languages	May 1, 2022	Language IdentificationMulti-Task Learning	—Unverified
Automated speech tools for helping communities process restricted-access corpora for language revival efforts	Apr 15, 2022	Action DetectionActivity Detection	—Unverified
Transducer-based language embedding for spoken language identification	Apr 8, 2022	Language IdentificationSpoken language identification	—Unverified
Partial Coupling of Optimal Transport for Spoken Language Identification	Mar 31, 2022	Domain AdaptationLanguage Identification	—Unverified
Improving Language Identification of Accented Speech	Mar 31, 2022	Language Identificationspeech-recognition	—Unverified
Code Switched and Code Mixed Speech Recognition for Indic languages	Mar 30, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Geographic Adaptation of Pretrained Language Models	Mar 16, 2022	Language IdentificationLanguage Modeling	CodeCode Available
Automatic Language Identification for Celtic Texts	Mar 9, 2022	Language Identification	—Unverified
Enhance Language Identification using Dual-mode Model with Knowledge Distillation	Mar 7, 2022	Knowledge DistillationLanguage Identification	CodeCode Available
Towards a Common Speech Analysis Engine	Mar 1, 2022	Emotion RecognitionLanguage Identification	—Unverified
Attentive Temporal Pooling for Conformer-based Streaming Language Identification in Long-form Speech	Feb 24, 2022	Domain AdaptationForm	CodeCode Available
CALCS 2021 Shared Task: Machine Translation for Code-Switched Data	Feb 19, 2022	Language IdentificationMachine Translation	—Unverified
HaT5: Hate Language Identification using Text-to-Text Transfer Transformer	Feb 11, 2022	Data AugmentationExplainable artificial intelligence	—Unverified
Translated Texts Under the Lens: From Machine Translation Detection to Source Language Identification	Jan 16, 2022	Language IdentificationMachine Translation	—Unverified
Cognitive Computing to Optimize IT Services	Dec 28, 2021	Language IdentificationText Summarization	—Unverified
LUC at ComMA-2021 Shared Task: Multilingual Gender Biased and Communal Language Identification without using linguistic features	Dec 19, 2021	Language Identification	—Unverified
Integrating Knowledge in End-to-End Automatic Speech Recognition for Mandarin-English Code-Switching	Dec 19, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Robust Speech Representation Learning via Flow-based Embedding Regularization	Dec 7, 2021	Deep LearningLanguage Identification	—Unverified
MUM at ComMA@ICON: Multilingual Gender Biased and Communal Language Identification Using Supervised Learning Approaches	Dec 1, 2021	Language Identification	—Unverified
BFCAI at ComMA@ICON 2021: Support Vector Machines for Multilingual Gender Biased and Communal Language Identification	Dec 1, 2021	Language Identification	—Unverified

Show:10 25 50

← PrevPage 4 of 16Next →

All datasets VOXLINGUA107 GlotLID-C Nordic Language Identification OpenSubtitles Universal Dependencies VoxForge

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	wav2vec 2.0 LV-60K	Error rate	7.2	—	Unverified
2	XLS-R	Error rate	5.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GlotLID	Macro F1	0.98	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	FastText	Accuracy	0.97	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Apple bi-LSTM	Accuracy	91.37	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Apple bi-LSTM	Accuracy	86.93	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ConformerG-P	Accuracy	99.8	—	Unverified