Language Identification

Language identification is the task of determining the language of a text.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 201–225 of 794 papers

Title	Date	Tasks	Status	Hype
Hyperseed: Unsupervised Learning with Vector Symbolic Architectures	Oct 15, 2021	Few-Shot LearningLanguage Identification	CodeCode Available	1
Mandarin-English Code-switching Speech Recognition with Self-supervised Speech Representation Models	Oct 7, 2021	Language IdentificationSelf-Supervised Learning	—Unverified	0
Pretrained Transformers for Offensive Language Identification in Tanglish	Oct 6, 2021	Language IdentificationText Classification	CodeCode Available	0
Is Attention always needed? A Case Study on Language Identification from Speech	Oct 5, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition	Sep 27, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Language Identification with a Reciprocal Rank Classifier	Sep 20, 2021	Domain AdaptationLanguage Identification	CodeCode Available	0
UPV at CheckThat! 2021: Mitigating Cultural Differences for Identifying Multilingual Check-worthy Claims	Sep 19, 2021	Fact CheckingLanguage Identification	CodeCode Available	0
Unsupervised Personality-Aware Language Identification	Sep 17, 2021	Language Identification	—Unverified	0
The futility of STILTs for the classification of lexical borrowings in Spanish	Sep 17, 2021	Language Identificationnamed-entity-recognition	—Unverified	0
On the Language-specificity of Multilingual BERT and the Impact of Fine-tuning	Sep 14, 2021	Language IdentificationNatural Language Inference	CodeCode Available	0
FBERT: A Neural Transformer for Identifying Offensive Content	Sep 10, 2021	Language IdentificationXLM-R	—Unverified	0
Cross-lingual Offensive Language Identification for Low Resource Languages: The Case of Marathi	Sep 8, 2021	Language IdentificationTransfer Learning	CodeCode Available	0
A Pre-trained Transformer and CNN Model with Joint Language ID and Part-of-Speech Tagging for Code-Mixed Social-Media Text	Sep 1, 2021	Language IdentificationPart-Of-Speech Tagging	—Unverified	0
Fiction in Russian Translation: A Translationese Study	Sep 1, 2021	Binary ClassificationLanguage Identification	—Unverified	0
Corpus Creation and Language Identification in Low-Resource Code-Mixed Telugu-English Text	Sep 1, 2021	ClassificationLanguage Identification	—Unverified	0
Offensive Language Identification in Low-resourced Code-mixed Dravidian languages using Pseudo-labeling	Aug 27, 2021	Language IdentificationMarketing	CodeCode Available	0
Towards Offensive Language Identification for Tamil Code-Mixed YouTube Comments and Posts	Aug 24, 2021	Language IdentificationTransfer Learning	CodeCode Available	0
A Dual-Decoder Conformer for Multilingual Speech Recognition	Aug 22, 2021	DecoderLanguage Identification	—Unverified	0
Dyn-ASR: Compact, Multilingual Speech Recognition via Spoken Language and Accent Identification	Aug 4, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
OLR 2021 Challenge: Datasets, Rules and Baselines	Jul 23, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Improved Language Identification Through Cross-Lingual Self-Supervised Learning	Jul 8, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Oriental Language Recognition (OLR) 2020: Summary and Analysis	Jul 5, 2021	Dialect IdentificationLanguage Identification	—Unverified	0
Language Identification of Hindi-English tweets using code-mixed BERT	Jul 2, 2021	Language IdentificationTransfer Learning	—Unverified	0
Language Lexicons for Hindi-English Multilingual Text Processing	Jun 29, 2021	Language Identification	—Unverified	0
A Simple and Efficient Probabilistic Language model for Code-Mixed Text	Jun 29, 2021	Information RetrievalLanguage Identification	—Unverified	0

Show:10 25 50

← PrevPage 9 of 32Next →

All datasets VOXLINGUA107 GlotLID-C Nordic Language Identification OpenSubtitles Universal Dependencies VoxForge

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	wav2vec 2.0 LV-60K	Error rate	7.2	—	Unverified
2	XLS-R	Error rate	5.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GlotLID	Macro F1	0.98	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	FastText	Accuracy	0.97	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Apple bi-LSTM	Accuracy	91.37	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Apple bi-LSTM	Accuracy	86.93	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ConformerG-P	Accuracy	99.8	—	Unverified