SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 1370113750 of 17610 papers

TitleStatusHype
Models In a Spelling Bee: Language Models Implicitly Learn the Character Composition of TokensCode0
Reducing Exposure Bias in Training Recurrent Neural Network Transducers0
Prompt-Learning for Fine-Grained Entity Typing0
Taming the Beast: Learning to Control Neural Conversational Models0
Detection of Criminal Texts for the Polish State Border Guard0
Using BERT Encoding and Sentence-Level Language Model for Sentence Ordering0
UzBERT: pretraining a BERT model for Uzbek0
cushLEPOR: customising hLEPOR metric using Optuna for higher agreement with human judgments or pre-trained language model LaBSECode0
Exploiting Multi-Object Relationships for Detecting Adversarial Attacks in Complex Scenes0
Language Model Augmented Relevance Score0
Scaling Laws for Deep Learning0
A Weakly Supervised Dataset of Fine-Grained Emotions in PortugueseCode0
Automatic Multi-Label Prompting: Simple and Interpretable Few-Shot Classification0
0.8% Nyquist computational ghost imaging via non-experimental deep learning0
Deduplicating Training Data Makes Language Models Better0
Autoencoders as Tools for Program SynthesisCode0
Deep Natural Language Processing for LinkedIn Search0
Caption Generation on Scenes with Seen and Unseen Object Categories0
Towards Structured Dynamic Sparse Pre-Training of BERT0
Modeling Relevance Ranking under the Pre-training and Fine-tuning Paradigm0
Mounting Video Metadata on Transformer-based Language Model for Open-ended Video Question Answering0
Perturbing Inputs for Fragile Interpretations in Deep Natural Language ProcessingCode0
Extracting Semantics from Maintenance Records0
A Transformer-based Math Language Model for Handwritten Math Expression Recognition0
SynCoBERT: Syntax-Guided Multi-Modal Contrastive Pre-Training for Code Representation0
BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease DiagnosisCode0
IntenT5: Search Result Diversification using Causal Language Models0
Do Images really do the Talking? Analysing the significance of Images in Tamil Troll meme classificationCode0
Leveraging Commonsense Knowledge on Classifying False News and Determining Checkworthiness of Claims0
Language Model Evaluation in Open-ended Text Generation0
StrucTexT: Structured Text Understanding with Multi-Modal TransformersCode0
Offensive Language and Hate Speech Detection with Deep Learning and Transfer Learning0
Towards Zero-shot Language Modeling0
LadRa-Net: Locally-Aware Dynamic Re-read Attention Net for Sentence Semantic Matching0
Sentence Semantic Regression for Text Generation0
Deriving Disinformation Insights from Geolocalized Twitter CalloutsCode0
FMMformer: Efficient and Flexible Transformer via Decomposed Near-field and Far-field Attention0
Curriculum learning for language modelingCode0
Mitigating harm in language models with conditional-likelihood filtration0
Large-Scale Differentially Private BERT0
Your fairness may vary: Pretrained language model fairness in toxic text classification0
Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech ProcessingCode0
Is My Model Using The Right Evidence? Systematic Probes for Examining Evidence-Based Tabular Reasoning0
LICHEE: Improving Language Model Pre-training with Multi-grained TokenizationCode0
Look Back Again: Dual Parallel Attention Network for Accurate and Robust Scene Text RecognitionCode0
QASR: QCRI Aljazeera Speech Resource A Large Scale Annotated Arabic Speech Corpus0
NS-Hunter: BERT-Cloze Based Semantic Denoising for Distantly Supervised Relation Classification0
Small-Scale Cross-Language Authorship Attribution on Social Media Comments0
Rakuten’s Participation in WAT 2021: Examining the Effectiveness of Pre-trained Models for Multilingual and Multimodal Machine Translation0
SkoltechNLP at SemEval-2021 Task 2: Generating Cross-Lingual Training Data for the Word-in-Context Task0
Show:102550
← PrevPage 275 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified