SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 1690116950 of 17610 papers

TitleStatusHype
Sentiment Analysis on Monolingual, Multilingual and Code-Switching Twitter Corpora0
Parsing English into Abstract Meaning Representation Using Syntax-Based Machine Translation0
Learning Word Meanings and Grammar for Describing Everyday Activities in Smart Environments0
Reverse-engineering Language: A Study on the Semantic Compositionality of German Compounds0
ParFDA for Fast Deployment of Accurate Statistical Machine Translation Systems, Benchmarks, and Statistics0
Natural Language Generation from Pictographs0
Touch-Based Pre-Post-Editing of Machine Translation Output0
Open-Domain Name Error Detection using a Multi-Task RNN0
The RWTH Aachen German-English Machine Translation System for WMT 20150
Results of the WMT15 Tuning Shared Task0
Transducer Disambiguation with Sparse Topological Features0
SHEF-NN: Translation Quality Estimation with Neural Networks0
Reinforcing the Topic of Embeddings with Theta Pure Dependence for Text Classification0
Statistical Machine Translation Improvement based on Phrase Selection0
Predicting Pronouns across Languages with Continuous Word Spaces0
Script Induction as Language Modeling0
Predicting Machine Translation Adequacy with Document Embeddings0
Statistical Machine Translation with Automatic Identification of Translationese0
The Karlsruhe Institute of Technology Translation Systems for the WMT 20150
Auto-Sizing Neural Networks: With Applications to n-gram Language Models0
End-to-End Attention-based Large Vocabulary Speech RecognitionCode0
Probabilistic Modelling of Morphologically Rich Languages0
Online Representation Learning in Recurrent Neural Language Models0
Depth-Gated LSTM0
Deep Graph Kernels0
Finding Function in Form: Compositional Character Models for Open Vocabulary Word RepresentationCode0
An End-to-End Neural Network for Polyphonic Piano Music TranscriptionCode0
Sublinear Partition EstimationCode0
Handwritten Text Recognition Results on the Bentham Collection with Improved Classical N-Gram-HMM methods0
Classifying Syntactic Categories in the Chinese Dependency Network0
Discriminative Segmental Cascades for Feature-Rich Phone Recognition0
Grid Long Short-Term MemoryCode0
Dependency Recurrent Neural Language Models for Sentence CompletionCode0
genCNN: A Convolutional Architecture for Word Sequence Prediction0
Bilingual Word Embeddings from Non-Parallel Document-Aligned Data Applied to Bilingual Lexicon Induction0
Analyzing Optimization for Statistical Machine Translation: MERT Learns Verbosity, PRO Learns Length0
Chinese Grammatical Error Diagnosis System Based on Hybrid Model0
Deep Markov Neural Network for Sequential Data Classification0
Incremental Recurrent Neural Network Dependency Parser with Search-based Discriminative Training0
Evaluating distributed word representations for capturing semantics of biomedical concepts0
DCU-ADAPT: Learning Edit Operations for Microblog Normalisation with the Generalised Perceptron0
Chinese Spelling Check System Based on N-gram Model0
Inducing Word and Part-of-Speech with Pitman-Yor Hidden Semi-Markov Models0
Arib@QALB-2015 Shared Task: A Hybrid Cascade Model for Arabic Spelling Error Detection and Correction0
Improving Pivot Translation by Remembering the Pivot0
Entity Retrieval via Entity Factoid Hierarchy0
Dependency Parsing with Graph Rewriting0
Generative Incremental Dependency Parsing with Neural Networks0
Instance Selection Improves Cross-Lingual Model Training for Fine-Grained Sentiment Analysis0
GWU-HASP-2015@QALB-2015 Shared Task: Priming Spelling Candidates with Probability0
Show:102550
← PrevPage 339 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified