SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 1555115600 of 17610 papers

TitleStatusHype
Enabling Robots to Understand Incomplete Natural Language Instructions Using Commonsense Reasoning0
Routing Networks and the Challenges of Modular and Compositional ComputationCode0
Neural Machine Translation with Recurrent Highway Networks0
Several Experiments on Investigating Pretraining and Knowledge-Enhanced Models for Natural Language InferenceCode0
Think Again Networks and the Delta LossCode0
Probing What Different NLP Tasks Teach Machines about Function Word Comprehension0
Detecting Machine-Translated Paragraphs by Matching Similar Words0
TextKD-GAN: Text Generation using KnowledgeDistillation and Generative Adversarial NetworksCode0
Generating Long Sequences with Sparse TransformersCode3
Adversarial Dropout for Recurrent Neural NetworksCode0
The Curious Case of Neural Text DegenerationCode1
Investigating Prior Knowledge for Challenging Chinese Machine Reading ComprehensionCode0
Good-Enough Compositional Data AugmentationCode0
Few-Shot NLG with Pre-Trained Language ModelCode0
Language Models with TransformersCode0
An Evaluation of Transfer Learning for Classifying Sales Engagement Emails at Large Scale0
Mask-Predict: Parallel Decoding of Conditional Masked Language ModelsCode1
Suggestion Mining from Online Reviews using ULMFiTCode0
Language Modeling through Long Term Memory Network0
SpecAugment: A Simple Data Augmentation Method for Automatic Speech RecognitionCode1
Sparseout: Controlling Sparsity in Deep NetworksCode0
Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation0
Effective Estimation of Deep Generative Language ModelsCode0
Dynamic Evaluation of Transformer Language ModelsCode0
Sameness Entices, but Novelty Enchants in Fanfiction OnlineCode0
Pun Generation with SurpriseCode0
Rare Words: A Major Problem for Contextualized Embeddings And How to Fix it by Attentive MimickingCode0
Legal Area Classification: A Comparative Study of Text Classifiers on Singapore Supreme Court Judgments0
IIT (BHU) Varanasi at MSR-SRST 2018: A Language Model Based Approach for Natural Language GenerationCode0
A Graph-based Model for Joint Chinese Word Segmentation and Dependency ParsingCode0
Knowledge-Augmented Language Model and its Application to Unsupervised Named-Entity RecognitionCode0
Exploring Methods for the Automatic Detection of Errors in Manual Transcription0
A Statistical Investigation of Long Memory in Language and MusicCode0
Knowledge Distillation For Recurrent Neural Network Language Modeling With Trust Regularization0
WeNet: Weighted Networks for Recurrent Network Architecture Search0
Unsupervised Recurrent Neural Network GrammarsCode0
SEQ^3: Differentiable Sequence-to-Sequence-to-Sequence Autoencoder for Unsupervised Abstractive Sentence CompressionCode0
Jasper: An End-to-End Convolutional Neural Acoustic ModelCode0
Alternative Weighting Schemes for ELMo EmbeddingsCode0
Identifying and Reducing Gender Bias in Word-Level Language Models0
Sequence-to-Sequence Speech Recognition with Time-Depth Separable Convolutions0
Riemannian Normalizing Flow on Variational Wasserstein Autoencoder for Text ModelingCode0
Unsupervised Domain Adaptation of Contextualized Embeddings for Sequence LabelingCode0
Visualizing Attention in Transformer-Based Language Representation Models0
VideoBERT: A Joint Model for Video and Language Representation LearningCode0
Understanding language-elicited EEG data by predicting it from a fine-tuned language model0
A string-to-graph constructive alignment algorithm for discrete and probabilistic language modeling0
fairseq: A Fast, Extensible Toolkit for Sequence ModelingCode1
Conversation Model Fine-Tuning for Classifying Client Utterances in Counseling Dialogues0
SART - Similarity, Analogies, and Relatedness for Tatar Language: New Benchmark Datasets for Word Embeddings EvaluationCode0
Show:102550
← PrevPage 312 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified