SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 1750117550 of 17610 papers

TitleStatusHype
Decoding Running Key Ciphers0
Bootstrapping a Unified Model of Lexical and Phonetic Acquisition0
Computational Approaches to Sentence Completion0
Improving Word Representations via Global Context and Multiple Word Prototypes0
Deciphering Foreign Language by Combining Language Models and Context Vectors0
Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining0
Combining Word-Level and Character-Level Models for Machine Translation Between Closely-Related Languages0
Text Segmentation by Language Using Minimum Description Length0
Topic Models for Dynamic Translation Model Adaptation0
Mixing Multiple Translation Models in Statistical Machine Translation0
Large-Scale Syntactic Language Modeling with Treelets0
Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information0
Unsupervised Semantic Role Induction with Global Role Ordering0
Word Sense Disambiguation Improves Information Retrieval0
Utilizing Dependency Language Models for Graph-based Dependency Parsing Models0
A Joint Model of Language and Perception for Grounded Attribute Learning0
Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and TranscriptionCode0
Application d'un algorithme de traduction statistique \`a la normalisation de textos (Applying a Statistical Machine Translation Algorithm to SMS Text Message Normalization) [in French]0
Impact du degr\'e de supervision sur l'adaptation \`a un domaine d'un mod\`ele de langage \`a partir du Web (Impact of the level of supervision on Web-based language model domain adaptation) [in French]0
Analysing the Effect of Out-of-Domain Data on SMT Systems0
Twitter Translation using Translation-Based Cross-Lingual Retrieval0
UPM system for WMT 20120
Rel-grams: A Probabilistic Model of Relations in Text0
Quality estimation for Machine Translation output using linguistic analysis and decoding features0
The UPC Submission to the WMT 2012 Shared Task on Quality Estimation0
Selecting Data for English-to-Czech Machine Translation0
Towards Effective Use of Training Data in Statistical Machine Translation0
Morpheme- and POS-based IBM1 and language model scores for translation quality estimation0
The RWTH Aachen Machine Translation System for WMT 20120
PROMT DeepHybrid system for WMT12 shared translation task0
The Karlsruhe Institute of Technology Translation Systems for the WMT 20120
LIUM's SMT Machine Translation Systems for WMT 20120
Kriya - The SFU System for Translation Task at WMT-120
Learning to Interpret Natural Language Instructions0
KU Leuven at HOO-2012: A Hybrid Approach to Detection and Correction of Determiner and Preposition Errors in Non-native English Text0
Analyse des performances de mod\`eles de langage sub-lexicale pour des langues peu-dot\'ees \`a morphologie riche (Performance analysis of sub-word language modeling for under-resourced languages with rich morphology: case study on Swahili and Amharic) [in French]0
Deep Neural Network Language Models0
Continuous Space Translation Models with Neural Networks0
Implicitly Intersecting Weighted Automata using Dual Decomposition0
Beauty Before Age? Applying Subjectivity to Automatic English Adjective Ordering0
A Comparative Investigation of Morphological Language Modeling for the Languages of the European Union0
A Challenge Set for Advancing Language Modeling0
Identifying Comparable Corpora Using LDA0
Towards Using EEG to Improve ASR Accuracy0
On-Demand Distributional Semantic Distance and Paraphrasing0
Measuring the Influence of Long Range Dependencies with Neural Network Language Models0
Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT0
The Intelius Nickname Collection: Quantitative Analyses from Billions of Public Records0
Large, Pruned or Continuous Space Language Models on a GPU for Statistical Machine Translation0
Toward Tree Substitution Grammars with Latent Annotations0
Show:102550
← PrevPage 351 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified