SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 1325113300 of 17610 papers

TitleStatusHype
R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language ModelingCode1
Projection of Turn Completion in Incremental Spoken Dialogue Systems0
Getting to Production with Few-shot Natural Language Generation Models0
Cross-Lingual Transfer Learning for Statistical Type Inference0
Word-Free Spoken Language Understanding for Mandarin-Chinese0
XLM-E: Cross-lingual Language Model Pre-training via ELECTRACode1
ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin InformationCode1
A Simple and Efficient Probabilistic Language model for Code-Mixed Text0
A Knowledge-Grounded Dialog System Based on Pre-Trained Language Models0
What's in a Measurement? Using GPT-3 on SemEval 2021 Task 8 -- MeasEval0
R-Drop: Regularized Dropout for Neural NetworksCode1
Stabilizing Equilibrium Models by Jacobian RegularizationCode1
SymbolicGPT: A Generative Transformer Model for Symbolic RegressionCode1
Visual Conceptual Blending with Large-scale Language and Vision Models0
Toward Less Hidden Cost of Code Completion with Acceptance and Ranking Models0
Multimodal Few-Shot Learning with Frozen Language Models0
Language Models are Good Translators0
Learning to Sample Replacements for ELECTRA Pre-Training0
Winner Team Mia at TextVQA Challenge 2021: Vision-and-Language Representation Learning with Pre-trained Sequence-to-Sequence Model0
QASR: QCRI Aljazeera Speech Resource -- A Large Scale Annotated Arabic Speech Corpus0
Multi-objective Asynchronous Successive HalvingCode3
Clinical Named Entity Recognition using Contextualized Token Representations0
CharacterChat: Supporting the Creation of Fictional Characters through Conversation and Progressive Manifestation with a Chatbot0
A Case Study in Bootstrapping Ontology Graphs from Textbooks0
Combining Analogy with Language Models for Knowledge ExtractionCode0
Prompt Tuning or Fine-Tuning - Investigating Relational Knowledge in Pre-Trained Language ModelsCode0
Structured in Space, Randomized in Time: Leveraging Dropout in RNNs for Efficient Training0
End-to-End Task-Oriented Dialog Modeling with Semi-Structured Knowledge ManagementCode0
Data Augmentation for Opcode Sequence Based Malware Detection0
Ad Text Classification with Transformer-Based Natural Language Processing Methods0
Membership Inference on Word Embedding and Beyond0
Secure Distributed Training at ScaleCode1
CLIP2Video: Mastering Video-Text Retrieval via Image CLIPCode1
A Discriminative Entity-Aware Language Model for Virtual Assistants0
A Brief Study on the Effects of Training Generative Dialogue Models with a Semantic lossCode0
RSTNet: Captioning With Adaptive Attention on Visual and Non-Visual WordsCode1
MaxUp: Lightweight Adversarial Training With Data Augmentation Improves Neural Network Training0
Transitional Adaptation of Pretrained Models for Visual Storytelling0
Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets0
Low Resource German ASR with Untranscribed Data Spoken by Non-native Children -- INTERSPEECH 2021 Shared Task SPAPL System0
Label prompt for multi-label text classification0
SPBERT: An Efficient Pre-training BERT on SPARQL Queries for Question Answering over Knowledge GraphsCode1
Learning to Complete Code with Sketches0
Distributed Deep Learning in Open CollaborationsCode1
An Improved Single Step Non-autoregressive Transformer for Automatic Speech Recognition0
BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-modelsCode1
Golos: Russian Dataset for Speech ResearchCode1
On Anytime Learning at MacroscaleCode0
LoRA: Low-Rank Adaptation of Large Language ModelsCode2
Augmented Neural Story Generation with Commonsense Inference0
Show:102550
← PrevPage 266 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified