Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 15751–15800 of 17610 papers

Title	Date	Tasks	Status	Hype
Language Models Learn POS First	Nov 1, 2018	Language ModelingLanguage Modelling	—Unverified	0
Representation of Word Meaning in the Intermediate Projection Layer of a Neural Language Model	Nov 1, 2018	Language ModelingLanguage Modelling	—Unverified	0
Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Syntactic Task Analysis	Nov 1, 2018	CCG SupertaggingLanguage Modeling	—Unverified	0
What do RNN Language Models Learn about Filler--Gap Dependencies?	Nov 1, 2018	Language ModelingLanguage Modelling	—Unverified	0
Juman++: A Morphological Analysis Toolkit for Scriptio Continua	Nov 1, 2018	Art AnalysisLanguage Modeling	CodeCode Available	0
On the End-to-End Solution to Mandarin-English Code-switching Speech Recognition	Nov 1, 2018	Data AugmentationLanguage Identification	CodeCode Available	0
Understanding Learning Dynamics Of Language Models with SVCCA	Nov 1, 2018	Language ModelingLanguage Modelling	—Unverified	0
Towards Coherent and Cohesive Long-form Text Generation	Nov 1, 2018	FormLanguage Modeling	—Unverified	0
Improving Machine Reading Comprehension with General Reading Strategies	Oct 31, 2018	ARCLanguage Modeling	CodeCode Available	0
Towards End-to-end Automatic Code-Switching Speech Recognition	Oct 30, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Learning Cross-Lingual Sentence Representations via a Multi-task Dual-Encoder Model	Oct 30, 2018	Few-Shot LearningLanguage Modeling	—Unverified	0
Recurrent Attention Unit	Oct 30, 2018	General ClassificationHandwriting Recognition	—Unverified	0
Language Modeling with Sparse Product of Sememe Experts	Oct 29, 2018	Language ModelingLanguage Modelling	CodeCode Available	0
Visual Re-ranking with Natural Language Understanding for Text Spotting	Oct 29, 2018	Language ModelingLanguage Modelling	CodeCode Available	0
Counting in Language with RNNs	Oct 29, 2018	Language ModelingLanguage Modelling	—Unverified	0
Cascaded CNN-resBiLSTM-CTC: An End-to-End Acoustic Model For Speech Recognition	Oct 29, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Language Modeling for Code-Switching: Evaluation, Integration of Monolingual Data, and Discriminative Training	Oct 28, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	0
Can Entropy Explain Successor Surprisal Effects in Reading?	Oct 26, 2018	Language ModelingLanguage Modelling	—Unverified	0
Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells	Oct 25, 2018	Depth EstimationDepth Prediction	CodeCode Available	1
Learn to Code-Switch: Data Augmentation using Copy Mechanism on Language Modeling	Oct 24, 2018	Data AugmentationLanguage Modeling	—Unverified	0
Universal Language Model Fine-Tuning with Subword Tokenization for Polish	Oct 24, 2018	Language ModelingLanguage Modelling	CodeCode Available	0
A Deep Generative Acoustic Model for Compositional Automatic Speech Recognition	Oct 23, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Neural Transition-based Syntactic Linearization	Oct 23, 2018	Language ModelingLanguage Modelling	—Unverified	0
Language Modeling at Scale	Oct 23, 2018	GPULanguage Modeling	—Unverified	0
Training Neural Speech Recognition Systems with Synthetic Speech Augmentation	Oct 22, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Bridging HMMs and RNNs through Architectural Transformations	Oct 22, 2018	Language ModelingLanguage Modelling	—Unverified	0
Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks	Oct 22, 2018	Constituency Grammar InductionInductive Bias	CodeCode Available	0
Real-time Neural-based Input Method	Oct 19, 2018	CPULanguage Modeling	—Unverified	0
Continual Learning of Recurrent Neural Networks by Locally Aligning Distributed Representations	Oct 17, 2018	Continual LearningLanguage Modeling	CodeCode Available	0
Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks	Oct 16, 2018	Evolutionary AlgorithmsLanguage Modeling	CodeCode Available	0
Structured Content Preservation for Unsupervised Text Style Transfer	Oct 15, 2018	Language ModelingLanguage Modelling	CodeCode Available	0
Trellis Networks for Sequence Modeling	Oct 15, 2018	Language ModelingLanguage Modelling	CodeCode Available	0
Exploring the Use of Attention within an Neural Machine Translation Decoder States to Translate Idioms	Oct 10, 2018	DecoderLanguage Modelling	—Unverified	0
textTOvec: Deep Contextualized Neural Autoregressive Topic Models of Language with Distributed Compositional Prior	Oct 9, 2018	Information ExtractionInformation Retrieval	CodeCode Available	0
Unsupervised Neural Word Segmentation for Chinese via Segmental Language Modeling	Oct 7, 2018	Chinese Word SegmentationDecoder	CodeCode Available	0
Understanding Recurrent Neural Architectures by Analyzing and Synthesizing Long Distance Dependencies in Benchmark Sequential Datasets	Oct 6, 2018	BenchmarkingLanguage Modeling	—Unverified	0
Learning Compressed Transforms with Low Displacement Rank	Oct 4, 2018	image-classificationImage Classification	CodeCode Available	0
Multilingual sequence-to-sequence speech recognition: architecture, transfer learning, and language modeling	Oct 4, 2018	Language ModelingLanguage Modelling	—Unverified	0
The Sogou-TIIC Speech Translation System for IWSLT 2018	Oct 1, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Zero-Shot Learning Based Approach For Medieval Word Recognition Using Deep-Learned Features	Oct 1, 2018	AttributeGeneralized Zero-Shot Learning	—Unverified	0
會議語音辨識使用語者資訊之語言模型調適技術 (On the Use of Speaker-Aware Language Model Adaptation Techniques for Meeting Speech Recognition ) [In Chinese]	Oct 1, 2018	Automatic Speech Recognition (ASR)Language Modeling	—Unverified	0
Prompsit's submission to WMT 2018 Parallel Corpus Filtering shared task	Oct 1, 2018	Active LearningLanguage Modeling	CodeCode Available	1
Supervised and Unsupervised Minimalist Quality Estimators: Vicomtech's Participation in the WMT 2018 Quality Estimation Task	Oct 1, 2018	Language ModelingLanguage Modelling	—Unverified	0
The ILSP/ARC submission to the WMT 2018 Parallel Corpus Filtering Shared Task	Oct 1, 2018	ARCClustering	—Unverified	0
The LMU Munich Unsupervised Machine Translation Systems	Oct 1, 2018	DenoisingLanguage Modeling	—Unverified	0
Learning Comment Controversy Prediction in Web Discussions Using Incidentally Supervised Multi-Task CNNs	Oct 1, 2018	Binary ClassificationLanguage Modeling	—Unverified	0
Keep It or Not: Word Level Quality Estimation for Post-Editing	Oct 1, 2018	Language ModelingLanguage Modelling	—Unverified	0
RTM results for Predicting Translation Performance	Oct 1, 2018	Language ModelingLanguage Modelling	—Unverified	0
The JHU Parallel Corpus Filtering Systems for WMT 2018	Oct 1, 2018	Language ModelingLanguage Modelling	—Unverified	0
Sounds Wilde. Phonetically Extended Embeddings for Author-Stylized Poetry Generation	Oct 1, 2018	AttributeLanguage Modeling	—Unverified	0

Show:10 25 50

← PrevPage 316 of 353Next →

All datasets WikiText-103 Penn Treebank (Word Level)enwik8 The Pile WikiText-2 LAMBADA One Billion Word Text8 Penn Treebank (Character Level)Hutter Prize OpenWebText SALMon

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Decay RNN	Validation perplexity	76.67	—	Unverified
2	GRU	Validation perplexity	53.78	—	Unverified
3	LSTM	Validation perplexity	52.73	—	Unverified
4	LSTM	Test perplexity	48.7	—	Unverified
5	Temporal CNN	Test perplexity	45.2	—	Unverified
6	TCN	Test perplexity	45.19	—	Unverified
7	GCNN-8	Test perplexity	44.9	—	Unverified
8	Neural cache model (size = 100)	Test perplexity	44.8	—	Unverified
9	Neural cache model (size = 2,000)	Test perplexity	40.8	—	Unverified
10	GPT-2 Small	Test perplexity	37.5	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TCN	Test perplexity	108.47	—	Unverified
2	Seq-U-Net	Test perplexity	107.95	—	Unverified
3	GRU (Bai et al., 2018)	Test perplexity	92.48	—	Unverified
4	R-Transformer	Test perplexity	84.38	—	Unverified
5	Zaremba et al. (2014) - LSTM (medium)	Test perplexity	82.7	—	Unverified
6	Gal & Ghahramani (2016) - Variational LSTM (medium)	Test perplexity	79.7	—	Unverified
7	LSTM (Bai et al., 2018)	Test perplexity	78.93	—	Unverified
8	Zaremba et al. (2014) - LSTM (large)	Test perplexity	78.4	—	Unverified
9	Gal & Ghahramani (2016) - Variational LSTM (large)	Test perplexity	75.2	—	Unverified
10	Inan et al. (2016) - Variational RHN	Test perplexity	66	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSTM (7 layers)	Bit per Character (BPC)	1.67	—	Unverified
2	Hypernetworks	Bit per Character (BPC)	1.34	—	Unverified
3	SHA-LSTM (4 layers, h=1024, no attention head)	Bit per Character (BPC)	1.33	—	Unverified
4	LN HM-LSTM	Bit per Character (BPC)	1.32	—	Unverified
5	ByteNet	Bit per Character (BPC)	1.31	—	Unverified
6	Recurrent Highway Networks	Bit per Character (BPC)	1.27	—	Unverified
7	Large FS-LSTM-4	Bit per Character (BPC)	1.25	—	Unverified
8	Large mLSTM	Bit per Character (BPC)	1.24	—	Unverified
9	AWD-LSTM (3 layers)	Bit per Character (BPC)	1.23	—	Unverified
10	Cluster-Former (#C=512)	Bit per Character (BPC)	1.22	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Smaller Transformer 126M (pre-trained)	Test perplexity	33	—	Unverified
2	OPT 125M	Test perplexity	32.26	—	Unverified
3	Larger Transformer 771M (pre-trained)	Test perplexity	28.1	—	Unverified
4	OPT 1.3B	Test perplexity	19.55	—	Unverified
5	GPT-Neo 125M	Test perplexity	17.83	—	Unverified
6	OPT 2.7B	Test perplexity	17.81	—	Unverified
7	Smaller Transformer 126M (fine-tuned)	Test perplexity	12	—	Unverified
8	GPT-Neo 1.3B	Test perplexity	11.46	—	Unverified
9	Transformer 125M	Test perplexity	10.7	—	Unverified
10	GPT-Neo 2.7B	Test perplexity	10.44	—	Unverified