SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 36013650 of 17610 papers

TitleStatusHype
AlephBERT:A Hebrew Large Pre-Trained Language Model to Start-off your Hebrew NLP Application WithCode1
Revisiting Simple Neural Probabilistic Language ModelsCode1
Librispeech Transducer Model with Internal Language Model Prior CorrectionCode1
MMBERT: Multimodal BERT Pretraining for Improved Medical VQACode1
NewsMTSC: A Dataset for (Multi-)Target-dependent Sentiment Classification in Political News ArticlesCode1
[Re] Rigging the Lottery: Making All Tickets WinnersCode1
Finetuning Pretrained Transformers into RNNsCode1
Controllable Generation from Pre-trained Language Models via Inverse PromptingCode1
Structure Inducing Pre-TrainingCode1
Improving the Lexical Ability of Pretrained Language Models for Unsupervised Neural Machine TranslationCode1
Refining Language Models with Compositional ExplanationsCode1
Inductive Relation Prediction by BERTCode1
Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text RecognitionCode1
MERMAID: Metaphor Generation with Symbolism and Discriminative DecodingCode1
The Interplay of Variant, Size, and Task Type in Arabic Pre-trained Language ModelsCode1
OAG-BERT: Towards A Unified Backbone Language Model For Academic Knowledge ServicesCode1
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLPCode1
Chess as a Testbed for Language Model State TrackingCode1
ZJUKLAB at SemEval-2021 Task 4: Negative Augmentation with Language Model for Reading Comprehension of Abstract MeaningCode1
RoBERTa-wwm-ext Fine-Tuning for Chinese Text ClassificationCode1
PADA: Example-based Prompt Learning for on-the-fly Adaptation to Unseen DomainsCode1
Linear Transformers Are Secretly Fast Weight ProgrammersCode1
VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image CaptioningCode1
Less is More: Pre-train a Strong Text Encoder for Dense Retrieval Using a Weak DecoderCode1
End-to-end lyrics Recognition with Voice to Singing Style TransferCode1
GradInit: Learning to Initialize Neural Networks for Stable and Efficient TrainingCode1
COCO-LM: Correcting and Contrasting Text Sequences for Language Model PretrainingCode1
DOBF: A Deobfuscation Pre-Training Objective for Programming LanguagesCode1
End-to-end Audio-visual Speech Recognition with ConformersCode1
Proof Artifact Co-training for Theorem Proving with Language ModelsCode1
Unsupervised Extractive Summarization using Pointwise Mutual InformationCode1
Argmax Flows and Multinomial Diffusion: Learning Categorical DistributionsCode1
Bias Out-of-the-Box: An Empirical Analysis of Intersectional Occupational Biases in Popular Generative Language ModelsCode1
Unifying Vision-and-Language Tasks via Text GenerationCode1
Phoneme-BERT: Joint Language Modelling of Phoneme Sequence and ASR TranscriptCode1
Generative Spoken Language Modeling from Raw AudioCode1
LESA: Linguistic Encapsulation and Semantic Amalgamation Based Generalised Claim Detection from Online ContentCode1
BENDR: using transformers and a contrastive self-supervised learning task to learn from massive amounts of EEG dataCode1
PolyLM: Learning about Polysemy through Language ModelingCode1
CPT: Efficient Deep Neural Network Training via Cyclic PrecisionCode1
EGFI: Drug-Drug Interaction Extraction and Generation with Fusion of Enriched Entity and Sentence InformationCode1
Cross-lingual Visual Pre-training for Multimodal Machine TranslationCode1
WangchanBERTa: Pretraining transformer-based Thai Language ModelsCode1
PalmTree: Learning an Assembly Language Model for Instruction EmbeddingCode1
Towards Facilitating Empathic Conversations in Online Mental Health Support: A Reinforcement Learning ApproachCode1
Persistent Anti-Muslim Bias in Large Language ModelsCode1
Implicit Unlikelihood Training: Improving Neural Text Generation with Reinforcement LearningCode1
Trankit: A Light-Weight Transformer-based Toolkit for Multilingual Natural Language ProcessingCode1
Multitask Learning for Emotion and Personality DetectionCode1
PhoNLP: A joint multi-task learning model for Vietnamese part-of-speech tagging, named entity recognition and dependency parsingCode1
Show:102550
← PrevPage 73 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified