SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 1375113800 of 17610 papers

TitleStatusHype
SJ_AJ@DravidianLangTech-EACL2021: Task-Adaptive Pre-Training of Multilingual BERT models for Offensive Language IdentificationCode0
Adversarial Contrastive Pre-training for Protein Sequences0
ShufText: A Simple Black Box Approach to Evaluate the Fragility of Text Classification Models0
Speech Recognition by Simply Fine-tuning BERT0
N-grams Bayesian Differential Privacy0
BCN2BRNO: ASR System Fusion for Albayzin 2020 Speech to Text Challenge0
BENDR: using transformers and a contrastive self-supervised learning task to learn from massive amounts of EEG dataCode1
Explaining Natural Language Processing Classifiers with Occlusion and Language ModelingCode0
BERTaú: Itaú BERT for digital customer service0
DRAG: Director-Generator Language Modelling Framework for Non-Parallel Author Stylized Rewriting0
ProtoDA: Efficient Transfer Learning for Few-Shot Intent Classification0
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNetCode2
LESA: Linguistic Encapsulation and Semantic Amalgamation Based Generalised Claim Detection from Online ContentCode1
Weakly Supervised Neuro-Symbolic Module Networks for Numerical Reasoning0
Developing for personalised learning: the long road from educational objectives to development and feedback0
Language Modelling as a Multi-Task Problem0
Unsupervised Abstractive Summarization of Bengali Text DocumentsCode0
Muppet: Massive Multi-task Representations with Pre-FinetuningCode0
First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERTCode0
CLiMP: A Benchmark for Chinese Language Model Evaluation0
Evaluation of BERT and ALBERT Sentence Embedding Performance on Downstream NLP Tasks0
Analyzing Zero-shot Cross-lingual Transfer in Supervised NLP Tasks0
Disambiguating Symbolic Expressions in Informal Documents0
Cross-lingual Visual Pre-training for Multimodal Machine TranslationCode1
EGFI: Drug-Drug Interaction Extraction and Generation with Fusion of Enriched Entity and Sentence InformationCode1
CPT: Efficient Deep Neural Network Training via Cyclic PrecisionCode1
PolyLM: Learning about Polysemy through Language ModelingCode1
WangchanBERTa: Pretraining transformer-based Thai Language ModelsCode1
MERMAID: Metaphor Generation with Symbolism and Discriminative Decoding0
Training Multilingual Pre-trained Language Model with Byte-level Subwords0
k-Neighbor Based Curriculum Sampling for Sequence Prediction0
The Impact of Multiple Parallel Phrase Suggestions on Email Input and Composition Behaviour of Native and Non-Native English Writers0
PalmTree: Learning an Assembly Language Model for Instruction EmbeddingCode1
Zero-shot Generalization in Dialog State Tracking through Generative Question Answering0
WeChat AI & ICT's Submission for DSTC9 Interactive Dialogue Evaluation Track0
Towards Facilitating Empathic Conversations in Online Mental Health Support: A Reinforcement Learning ApproachCode1
Joint Energy-based Model Training for Better Calibrated Natural Language Understanding ModelsCode0
Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-resource Speech Recognition0
Grid Search Hyperparameter Benchmarking of BERT, ALBERT, and LongFormer on DuoRC0
KDLSQ-BERT: A Quantized Bert Combining Knowledge Distillation with Learned Step Size Quantization0
Persistent Anti-Muslim Bias in Large Language ModelsCode1
Transformer-based Language Model Fine-tuning Methods for COVID-19 Fake News Detection0
ECOL: Early Detection of COVID Lies Using Content, Prior Knowledge and Source InformationCode0
Improving Commonsense Causal Reasoning by Adversarial Training and Data Augmentation0
Self-Training Pre-Trained Language Models for Zero- and Few-Shot Multi-Dialectal Arabic Sequence LabelingCode0
Implicit Unlikelihood Training: Improving Neural Text Generation with Reinforcement LearningCode1
Evaluating Deep Learning Approaches for Covid19 Fake News Detection0
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient SparsityCode2
Learning Better Sentence Representation with Syntax Information0
Trankit: A Light-Weight Transformer-based Toolkit for Multilingual Natural Language ProcessingCode1
Show:102550
← PrevPage 276 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified