Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 12801–12850 of 17610 papers

Title	Date	Tasks	Status	Hype
Composable Sparse Fine-Tuning for Cross-Lingual Transfer	Oct 14, 2021	Cross-Lingual TransferLanguage Modeling	CodeCode Available	1
RecInDial: A Unified Framework for Conversational Recommendation with Pretrained Language Models	Oct 14, 2021	Conversational RecommendationDialogue Generation	—Unverified	0
Language Modelling via Learning to Rank	Oct 13, 2021	Knowledge DistillationLanguage Modelling	—Unverified	0
On Language Model Integration for RNN Transducer based Speech Recognition	Oct 13, 2021	Language ModelingLanguage Modelling	—Unverified	0
Maximizing Efficiency of Language Model Pre-training for Learning Representation	Oct 13, 2021	Language ModelingLanguage Modelling	—Unverified	0
Dict-BERT: Enhancing Language Model Pre-training with Dictionary	Oct 13, 2021	Language ModelingLanguage Modelling	CodeCode Available	0
Deep Learning for Bias Detection: From Inception to Deployment	Oct 12, 2021	Bias DetectionDeep Learning	—Unverified	0
Multi-Modal Pre-Training for Automated Speech Recognition	Oct 12, 2021	Language ModelingLanguage Modelling	—Unverified	0
Learning Compact Metrics for MT	Oct 12, 2021	Cross-Lingual TransferLanguage Modeling	CodeCode Available	1
Rescoring Sequence-to-Sequence Models for Text Line Recognition with CTC-Prefixes	Oct 12, 2021	DecoderHandwritten Text Recognition	CodeCode Available	1
Time Masking for Temporal Language Models	Oct 12, 2021	Change DetectionLanguage Modeling	CodeCode Available	1
Småprat: DialoGPT for Natural Language Generation of Swedish Dialogue by Transfer Learning	Oct 12, 2021	ChatbotLanguage Modeling	—Unverified	0
Balancing Average and Worst-case Accuracy in Multitask Learning	Oct 12, 2021	image-classificationImage Classification	—Unverified	0
SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition	Oct 11, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Multi-Task Learning for Situated Multi-Domain End-to-End Dialogue Systems	Oct 11, 2021	Causal Language ModelingDiversity	—Unverified	0
On a Benefit of Mask Language Modeling: Robustness to Simplicity Bias	Oct 11, 2021	Hate Speech DetectionLanguage Modeling	—Unverified	0
Breaking the Softmax Bottleneck for Sequential Recommender Systems with Dropout and Decoupling	Oct 11, 2021	Language ModelingLanguage Modelling	—Unverified	0
Evaluating User Perception of Speech Recognition System Quality with Semantic Distance Metric	Oct 11, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy	Oct 11, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Unsupervised Neural Machine Translation with Generative Language Models Only	Oct 11, 2021	Language ModelingLanguage Modelling	—Unverified	0
Wav2vec-Switch: Contrastive Learning from Original-noisy Speech Pairs for Robust Speech Recognition	Oct 11, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning	Oct 10, 2021	ArticlesFew-Shot Learning	CodeCode Available	1
Long Expressive Memory for Sequence Modeling	Oct 10, 2021	Language ModelingLanguage Modelling	CodeCode Available	1
Automatic Text Extractive Summarization Based on Graph and Pre-trained Language Model Attention	Oct 10, 2021	Extractive SummarizationLanguage Modeling	—Unverified	0
Frequency-aware SGD for Efficient Embedding Learning with Provable Benefits	Oct 10, 2021	Language ModelingLanguage Modelling	—Unverified	0
Improving Multi-Party Dialogue Discourse Parsing via Domain Integration	Oct 9, 2021	Discourse ParsingDomain Adaptation	CodeCode Available	1
Global Explainability of BERT-Based Evaluation Metrics by Disentangling along Linguistic Factors	Oct 8, 2021	Language ModelingLanguage Modelling	CodeCode Available	1
Towards Learning (Dis)-Similarity of Source Code from Program Contrasts	Oct 8, 2021	Clone DetectionContrastive Learning	—Unverified	0
KaraSinger: Score-Free Singing Voice Synthesis with VQ-VAE using Mel-spectrograms	Oct 8, 2021	Language ModelingLanguage Modelling	—Unverified	0
Layer-wise Pruning of Transformer Attention Heads for Efficient Language Modeling	Oct 7, 2021	Language ModelingLanguage Modelling	CodeCode Available	1
Pretrained Language Models are Symbolic Mathematics Solvers too!	Oct 7, 2021	IngenuityLanguage Modelling	CodeCode Available	1
Mixer-TTS: non-autoregressive, fast and compact text-to-speech model conditioned on language model embeddings	Oct 7, 2021	Language ModelingLanguage Modelling	CodeCode Available	1
Knowledge Distillation for Neural Transducers from Large Self-Supervised Pre-trained Models	Oct 7, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition	Oct 7, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Back from the future: bidirectional CTC decoding using future information in speech recognition	Oct 7, 2021	Language ModelingLanguage Modelling	—Unverified	0
Beam Search with Bidirectional Strategies for Neural Response Generation	Oct 7, 2021	DecoderLanguage Modeling	—Unverified	0
Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition	Oct 6, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Cut the CARP: Fishing for zero-shot story evaluation	Oct 6, 2021	Contrastive LearningLanguage Modeling	—Unverified	0
ABC: Attention with Bounded-memory Control	Oct 6, 2021	Language ModelingLanguage Modelling	—Unverified	0
8-bit Optimizers via Block-wise Quantization	Oct 6, 2021	Language ModelingLanguage Modelling	CodeCode Available	3
Self-Supervised Knowledge Assimilation for Expert-Layman Text Style Transfer	Oct 6, 2021	Language ModellingSelf-Supervised Learning	CodeCode Available	0
Objects in Semantic Topology	Oct 6, 2021	Incremental LearningLanguage Modelling	—Unverified	0
Teach Me What to Say and I Will Learn What to Pick: Unsupervised Knowledge Selection Through Response Generation with Pretrained Generative Models	Oct 5, 2021	DecoderLanguage Modelling	—Unverified	0
Language Modeling using LMUs: 10x Better Data Efficiency or Improved Scaling Compared to Transformers	Oct 5, 2021	Language ModelingLanguage Modelling	—Unverified	0
Attention Augmented Convolutional Transformer for Tabular Time-series	Oct 5, 2021	Language ModelingLanguage Modelling	—Unverified	0
Fast Contextual Adaptation with Neural Associative Memory for On-Device Personalized Speech Recognition	Oct 5, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
ASR Rescoring and Confidence Estimation with ELECTRA	Oct 5, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
A Survey On Neural Word Embeddings	Oct 5, 2021	Language ModellingRetrieval	—Unverified	0
AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts	Oct 4, 2021	Language ModelingLanguage Modelling	—Unverified	0
Contextualized Semantic Distance between Highly Overlapped Texts	Oct 4, 2021	Domain AdaptationLanguage Modeling	CodeCode Available	0

Show:10 25 50

← PrevPage 257 of 353Next →

All datasets WikiText-103 Penn Treebank (Word Level)enwik8 The Pile WikiText-2 LAMBADA One Billion Word Text8 Penn Treebank (Character Level)Hutter Prize OpenWebText SALMon

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Decay RNN	Validation perplexity	76.67	—	Unverified
2	GRU	Validation perplexity	53.78	—	Unverified
3	LSTM	Validation perplexity	52.73	—	Unverified
4	LSTM	Test perplexity	48.7	—	Unverified
5	Temporal CNN	Test perplexity	45.2	—	Unverified
6	TCN	Test perplexity	45.19	—	Unverified
7	GCNN-8	Test perplexity	44.9	—	Unverified
8	Neural cache model (size = 100)	Test perplexity	44.8	—	Unverified
9	Neural cache model (size = 2,000)	Test perplexity	40.8	—	Unverified
10	GPT-2 Small	Test perplexity	37.5	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TCN	Test perplexity	108.47	—	Unverified
2	Seq-U-Net	Test perplexity	107.95	—	Unverified
3	GRU (Bai et al., 2018)	Test perplexity	92.48	—	Unverified
4	R-Transformer	Test perplexity	84.38	—	Unverified
5	Zaremba et al. (2014) - LSTM (medium)	Test perplexity	82.7	—	Unverified
6	Gal & Ghahramani (2016) - Variational LSTM (medium)	Test perplexity	79.7	—	Unverified
7	LSTM (Bai et al., 2018)	Test perplexity	78.93	—	Unverified
8	Zaremba et al. (2014) - LSTM (large)	Test perplexity	78.4	—	Unverified
9	Gal & Ghahramani (2016) - Variational LSTM (large)	Test perplexity	75.2	—	Unverified
10	Inan et al. (2016) - Variational RHN	Test perplexity	66	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSTM (7 layers)	Bit per Character (BPC)	1.67	—	Unverified
2	Hypernetworks	Bit per Character (BPC)	1.34	—	Unverified
3	SHA-LSTM (4 layers, h=1024, no attention head)	Bit per Character (BPC)	1.33	—	Unverified
4	LN HM-LSTM	Bit per Character (BPC)	1.32	—	Unverified
5	ByteNet	Bit per Character (BPC)	1.31	—	Unverified
6	Recurrent Highway Networks	Bit per Character (BPC)	1.27	—	Unverified
7	Large FS-LSTM-4	Bit per Character (BPC)	1.25	—	Unverified
8	Large mLSTM	Bit per Character (BPC)	1.24	—	Unverified
9	AWD-LSTM (3 layers)	Bit per Character (BPC)	1.23	—	Unverified
10	Cluster-Former (#C=512)	Bit per Character (BPC)	1.22	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Smaller Transformer 126M (pre-trained)	Test perplexity	33	—	Unverified
2	OPT 125M	Test perplexity	32.26	—	Unverified
3	Larger Transformer 771M (pre-trained)	Test perplexity	28.1	—	Unverified
4	OPT 1.3B	Test perplexity	19.55	—	Unverified
5	GPT-Neo 125M	Test perplexity	17.83	—	Unverified
6	OPT 2.7B	Test perplexity	17.81	—	Unverified
7	Smaller Transformer 126M (fine-tuned)	Test perplexity	12	—	Unverified
8	GPT-Neo 1.3B	Test perplexity	11.46	—	Unverified
9	Transformer 125M	Test perplexity	10.7	—	Unverified
10	GPT-Neo 2.7B	Test perplexity	10.44	—	Unverified