SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 1040110450 of 17610 papers

TitleStatusHype
NeuSTIP: A Novel Neuro-Symbolic Model for Link and Time Prediction in Temporal Knowledge Graphs0
Neutral residues: revisiting adapters for model extension0
NeutraSum: A Language Model can help a Balanced Media Diet by Neutralizing News Summaries0
Nevermind: Instruction Override and Moderation in Large Language Models0
Never too Prim to Swim: An LLM-Enhanced RL-based Adaptive S-Surface Controller for AUVs under Extreme Sea Conditions0
NEVLP: Noise-Robust Framework for Efficient Vision-Language Pre-training0
New Baseline in Automatic Speech Recognition for Northern S\'ami0
New Directions in Vector Space Models of Meaning0
New Intent Discovery with Attracting and Dispersing Prototype0
NewsBERT: Distilling Pre-trained Language Model for Intelligent News Application0
NewsEdits 2.0: Learning the Intentions Behind Updating News0
NewsNet-SDF: Stochastic Discount Factor Estimation with Pretrained Language Model News Embeddings via Adversarial Networks0
New Textual Corpora for Serbian Language Modeling0
Next Word Suggestion using Graph Neural Network0
Nexus: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision0
NGPU-LM: GPU-Accelerated N-Gram Language Model for Context-Biasing in Greedy ASR Decoding0
Ngram2vec: Learning Improved Word Representations from Ngram Co-occurrence Statistics0
N-gram-based Tense Models for Statistical Machine Translation0
N-gram Counts and Language Models from the Common Crawl0
N-gram Language Modeling using Recurrent Neural Network Estimation0
N-gram language models for massively parallel devices0
N-grammer: Augmenting Transformers with latent n-grams0
N-gram Model for Chinese Grammatical Error Diagnosis0
N-gram Prediction and Word Difference Representations for Language Modeling0
N-grams Bayesian Differential Privacy0
NICT Kyoto Submission for the WMT’20 Quality Estimation Task: Intermediate Training for Domain and Task Adaptation0
NICT Kyoto Submission for the WMT’21 Quality Estimation Task: Multimetric Multilingual Pretraining for Critical Error Detection0
NIFTY Financial News Headlines Dataset0
niksss at SemEval-2022 Task 6: Are Traditionally Pre-Trained Contextual Embeddings Enough for Detecting Intended Sarcasm ?0
NILC at CWI 2018: Exploring Feature Engineering and Feature Learning0
NILC-SWORNEMO at the Surface Realization Shared Task: Exploring Syntax-Based Word Ordering using Neural Models0
NJU’s submission to the WMT20 QE Shared Task0
Abstract Operations Research Modeling Using Natural Language Inputs0
NL-Eye: Abductive NLI for Images0
NLNDE at SemEval-2023 Task 12: Adaptive Pretraining and Source Language Selection for Low-Resource Multilingual Sentiment Analysis0
NLP for Knowledge Discovery and Information Extraction from Energetics Corpora0
NLP-PINGAN-TECH @ CL-SciSumm 20200
NLP Service APIs and Models for Efficient Registration of New Clients0
NLP Service APIs and Models for Efficient Registration of New Clients0
nlpUP at SemEval-2019 Task 6: A Deep Neural Language Model for Offensive Language Detection0
nmT5 -- Is parallel data still relevant for pre-training massively multilingual language models?0
nmT5 - Is parallel data still relevant for pre-training massively multilingual language models?0
NN-grams: Unifying neural network and n-gram language models for Speech Recognition0
No Data to Crawl? Monolingual Corpus Creation from PDF Files of Truly low-Resource Languages in Peru0
Node Level Graph Autoencoder: Unified Pretraining for Textual Graph Learning0
Noise-Based Regularizers for Recurrent Neural Networks0
Noise-BERT: A Unified Perturbation-Robust Framework with Noise Alignment Pre-training for Noisy Slot Filling Task0
Noise Contrastive Estimation and Negative Sampling for Conditional Models: Consistency and Statistical Efficiency0
Noise Injection Systemically Degrades Large Language Model Safety Guardrails0
Noiser: Bounded Input Perturbations for Attributing Large Language Models0
Show:102550
← PrevPage 209 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified