SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 1690116950 of 17610 papers

TitleStatusHype
Morphological Analysis for Unsegmented Languages using Recurrent Neural Network Language Model0
Key Concept Identification for Medical Information Retrieval0
Script Induction as Language Modeling0
The Forest Convolutional Network: Compositional Distributional Semantics with a Neural Chart and without Binarization0
Investigating Continuous Space Language Models for Machine Translation Quality Estimation0
Open-Domain Name Error Detection using a Multi-Task RNN0
Pre-Computable Multi-Layer Neural Network Language Models0
Touch-Based Pre-Post-Editing of Machine Translation Output0
Learning Word Meanings and Grammar for Describing Everyday Activities in Smart Environments0
Reverse-engineering Language: A Study on the Semantic Compositionality of German Compounds0
Parsing English into Abstract Meaning Representation Using Syntax-Based Machine Translation0
Transducer Disambiguation with Sparse Topological Features0
A Neural Algorithm of Artistic StyleCode1
Character-Aware Neural Language ModelsCode1
Auto-Sizing Neural Networks: With Applications to n-gram Language Models0
Probabilistic Modelling of Morphologically Rich Languages0
End-to-End Attention-based Large Vocabulary Speech RecognitionCode0
Depth-Gated LSTM0
Online Representation Learning in Recurrent Neural Language Models0
Deep Graph Kernels0
Finding Function in Form: Compositional Character Models for Open Vocabulary Word RepresentationCode0
Sublinear Partition EstimationCode0
An End-to-End Neural Network for Polyphonic Piano Music TranscriptionCode0
Listen, Attend and SpellCode1
Handwritten Text Recognition Results on the Bentham Collection with Improved Classical N-Gram-HMM methods0
Classifying Syntactic Categories in the Chinese Dependency Network0
Discriminative Segmental Cascades for Feature-Rich Phone Recognition0
Grid Long Short-Term MemoryCode0
Dependency Recurrent Neural Language Models for Sentence CompletionCode0
Evaluating distributed word representations for capturing semantics of biomedical concepts0
GWU-HASP-2015@QALB-2015 Shared Task: Priming Spelling Candidates with Probability0
DCU-ADAPT: Learning Edit Operations for Microblog Normalisation with the Generalised Perceptron0
Chinese Grammatical Error Diagnosis System Based on Hybrid Model0
Arib@QALB-2015 Shared Task: A Hybrid Cascade Model for Arabic Spelling Error Detection and Correction0
Chinese Spelling Check System Based on N-gram Model0
Using word embedding for bio-event extraction0
Word Vector/Conditional Random Field-based Chinese Spelling Error Detection for SIGHAN-2015 Evaluation0
Passive and Pervasive Use of Bilingual Dictionary in Statistical Machine Translation0
Multi-system machine translation using online APIs for English-LatvianCode0
QCRI@QALB-2015 Shared Task: Correction of Arabic Text for Native and Non-Native Speakers' Errors0
QCMUQ@QALB-2015 Shared Task: Combining Character level MT and Error-tolerant Finite-State Recognition for Arabic Spelling Correction0
Toward Tweets Normalization Using Maximum Entropy0
Neural Network Transduction Models in Transliteration Generation0
Dependency Parsing with Graph Rewriting0
Analyzing Optimization for Statistical Machine Translation: MERT Learns Verbosity, PRO Learns Length0
Incremental Recurrent Neural Network Dependency Parser with Search-based Discriminative Training0
Instance Selection Improves Cross-Lingual Model Training for Fine-Grained Sentiment Analysis0
Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction0
Tibetan Unknown Word Identification from News Corpora for Supporting Lexicon-based Tibetan Word Segmentation0
Reducing infrequent-token perplexity via variational corpora0
Show:102550
← PrevPage 339 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified