SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 1570115750 of 17610 papers

TitleStatusHype
Practical Text Classification With Large Pre-Trained Language ModelsCode0
Quantification and Analysis of Scientific Language Variation Across Research Fields0
Parameter Re-Initialization through Cyclical Batch Size Schedules0
Comparing Neural- and N-Gram-Based Language Models for Word Segmentation0
Effectiveness of Character Language Model for Vietnamese Named Entity Recognition0
Universal Language Model Fine-tuning for Patent Classification0
Incorporating Context into Language Encoding Models for fMRI0
A Simple Cache Model for Image Recognition0
Deep Multimodal Learning: An Effective Method for Video Classification0
On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent0
Acoustics-guided evaluation (AGE): a new measure for estimating performance of speech enhancement algorithms for robust ASR0
ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural NetworkCode0
Multi-level Multimodal Common Semantic Space for Image-Phrase GroundingCode0
Are 2D-LSTM really dead for offline text recognition?0
Learning to Discover, Ground and Use Words with Segmental Neural Language Models0
Alignment Analysis of Sequential Segmentation of Lexicons to Improve Automatic Cognate DetectionCode0
Quantifying Uncertainties in Natural Language Processing Tasks0
Multi-cell LSTM Based Neural Language Model0
Extractive Summary as Discrete Latent Variables0
Exploring RNN-Transducer for Chinese Speech Recognition0
An Online Attention-based Model for Speech Recognition0
Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent AgentsCode0
Modeling Local Dependence in Natural Language with Multi-channel Recurrent Neural Networks0
Modular Networks: Learning to Decompose Neural Computation0
Forecasting People's Needs in Hurricane Events from Social Network0
Fine-tuning of Language Models with Discriminator0
Federated Learning for Mobile Keyboard PredictionCode0
Effective Representation for Easy-First Dependency ParsingCode0
Compositional Language Understanding with Text-based Relational ReasoningCode0
Language model integration based on memory control for sequence to sequence speech recognition0
Discriminative training of RNNLMs with the average word error criterion0
Transfer learning of language-independent end-to-end ASR with language model fusion0
The Marchex 2018 English Conversational Telephone Speech Recognition System0
Mesh-TensorFlow: Deep Learning for Supercomputers0
Do RNNs learn human-like abstract word order preferences?Code0
Towards Unsupervised Speech-to-Text Translation0
Progress and Tradeoffs in Neural Language Models0
Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data TasksCode0
Training Neural Speech Recognition Systems with Synthetic Speech Augmentation0
Analysing Dropout and Compounding Errors in Neural Language Models0
Adversarial Training of End-to-end Speech Recognition Using a Criticizing Language Model0
Cycle-consistency training for end-to-end speech recognition0
Does Syntactic Knowledge in Multilingual Language Models Transfer Across Languages?0
Closing Brackets with Recurrent Neural Networks0
Convolutions Are All You Need (For Classifying Character Sequences)0
Importance of Self-Attention for Sentiment Analysis0
`Indicatements' that character language models learn English morpho-syntactic units and regularities0
Er ... well, it matters, right? On the role of data representations in spoken language dependency parsing0
Interpreting Word-Level Hidden State Behaviour of Character-Level LSTM Language Models0
Learning to Define Terms in the Software Domain0
Show:102550
← PrevPage 315 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified