SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 1175111800 of 17610 papers

TitleStatusHype
NaturalProver: Grounded Mathematical Proof Generation with Language ModelsCode1
Investigating Lexical Replacements for Arabic-English Code-Switched Data Augmentation0
Segmenting Numerical Substitution Ciphers0
Know Where You're Going: Meta-Learning for Parameter-Efficient Fine-Tuning0
Large Language Models are Few-Shot Clinical Information Extractors0
Multimodal Knowledge Alignment with Reinforcement LearningCode1
Training Language Models with Memory AugmentationCode1
Transcormer: Transformer for Sentence Scoring with Sliding Language ModelingCode1
Garden-Path Traversal in GPT-2Code0
Toxicity Detection with Generative Prompt-based Inference0
K-12BERT: BERT for K-12 educationCode0
MaskEval: Weighted MLM-Based Evaluation for Text Summarization and Simplification0
RetroMAE: Pre-Training Retrieval-oriented Language Models Via Masked Auto-EncoderCode2
Enhancing Continual Learning with Global Prototypes: Counteracting Negative Representation Drift0
PoeLM: A Meter- and Rhyme-Controllable Language Model for Unsupervised Poetry GenerationCode0
PERT: A New Solution to Pinyin to Character Conversion TaskCode0
Multi-Level Modeling Units for End-to-End Mandarin Speech Recognition0
On the Role of Bidirectionality in Language Model Pre-Training0
Formulating Few-shot Fine-tuning Towards Language Model Pre-training: A Pilot Study on Named Entity RecognitionCode0
Evaluating the Impact of Model Scale for Compositional Generalization in Semantic Parsing0
ATTEMPT: Parameter-Efficient Multi-task Tuning via Attentional Mixtures of Soft PromptsCode1
Chunk-based Nearest Neighbor Machine TranslationCode0
GeoMLAMA: Geo-Diverse Commonsense Probing on Multilingual Pre-Trained Language ModelsCode1
BanglaNLG and BanglaT5: Benchmarks and Resources for Evaluating Low-Resource Natural Language Generation in BanglaCode1
Improving Short Text Classification With Augmented Data Using GPT-30
Challenges in Measuring Bias via Open-Ended Language GenerationCode0
On Measuring Social Biases in Prompt-Based Multi-Task LearningCode1
Towards Automated Document Revision: Grammatical Error Correction, Fluency Edits, and BeyondCode1
Looking for a Handsome Carpenter! Debiasing GPT-3 Job AdvertisementsCode0
RL with KL penalties is better viewed as Bayesian inference0
Prompt Tuning for Discriminative Pre-trained Language ModelsCode1
The Diminishing Returns of Masked Language Models to Science0
PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language ModelsCode1
Supporting Vision-Language Model Inference with Causality-pruning Knowledge Prompt0
BBTv2: Towards a Gradient-Free Future with Large Language ModelsCode2
Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models0
The Geometry of Multilingual Language Model RepresentationsCode1
Language Models with Image Descriptors are Strong Few-Shot Video-Language LearnersCode1
Housekeep: Tidying Virtual Households using Commonsense ReasoningCode1
Named Entity Linking with Entity Representation by Multiple Embeddings0
Scenario-based Multi-product Advertising Copywriting Generation for E-Commerce0
DeepStruct: Pretraining of Language Models for Structure PredictionCode1
Multilingual Normalization of Temporal Expressions with Masked Language ModelsCode0
KERPLE: Kernelized Relative Positional Embedding for Length ExtrapolationCode1
Visually-Augmented Language ModelingCode1
UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes0
Progressive Class Semantic Matching for Semi-supervised Text ClassificationCode0
RankGen: Improving Text Generation with Large Ranking ModelsCode1
Self-training with Two-phase Self-augmentation for Few-shot Dialogue GenerationCode0
Nebula-I: A General Framework for Collaboratively Training Deep Learning Models on Low-Bandwidth Cloud Clusters0
Show:102550
← PrevPage 236 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified