SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 1315113200 of 17610 papers

TitleStatusHype
Improving Low-Resource Named Entity Recognition via Label-Aware Data Augmentation and Curriculum Denoising0
基于预训练语言模型的繁体古文自动句读研究(Automatic Traditional Ancient Chinese Texts Segmentation and Punctuation Based on Pre-training Language Model)0
NS-Hunter: BERT-Cloze Based Semantic Denoising for Distantly Supervised Relation Classification0
Using Gender- and Polarity-Informed Models to Investigate Bias0
CommitBERT: Commit Message Generation Using Pre-Trained Programming Language ModelCode1
Meta-Learning for Few-Shot Named Entity Recognition0
Product Review Translation: Parallel Corpus Creation and Robustness towards User-generated Noisy Text0
Rakuten’s Participation in WAT 2021: Examining the Effectiveness of Pre-trained Models for Multilingual and Multimodal Machine Translation0
Time-Efficient Code Completion Model for the R Programming LanguageCode0
Stereotyping Norwegian Salmon: An Inventory of Pitfalls in Fairness Benchmark Datasets0
Measuring and Improving BERT's Mathematical Abilities by Predicting the Order of Reasoning.0
QASR: QCRI Aljazeera Speech Resource A Large Scale Annotated Arabic Speech Corpus0
nmT5 - Is parallel data still relevant for pre-training massively multilingual language models?0
MulDA: A Multilingual Data Augmentation Framework for Low-Resource Cross-Lingual NER0
KACE: Generating Knowledge Aware Contrastive Explanations for Natural Language Inference0
PRAL: A Tailored Pre-Training Model for Task-Oriented Dialog Generation0
Selecting Informative Contexts Improves Language Model Fine-tuning0
PHMOSpell: Phonological and Morphological Knowledge Guided Chinese Spelling Check0
ProtAugment: Intent Detection Meta-Learning through Unsupervised Diverse ParaphrasingCode1
MVP-BERT: Multi-Vocab Pre-training for Chinese BERT0
PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling CorrectionCode1
Unleash GPT-2 Power for Event Detection0
DeepBlueAI at SemEval-2021 Task 7: Detecting and Rating Humor and Offense with Stacking Diverse Language Model-Based Methods0
eMLM: A New Pre-training Objective for Emotion Related TasksCode1
EnsLM: Ensemble Language Model for Data Diversity by Semantic ClusteringCode0
Best of Both Worlds: Making High Accuracy Non-incremental Transformer-based Disfluency Detection Incremental0
Evaluating morphological typology in zero-shot cross-lingual transfer0
AND does not mean OR: Using Formal Languages to Study Language Models' Representations0
A Pre-training Strategy for Zero-Resource Response Selection in Knowledge-Grounded Conversations0
Attending Self-Attention: A Case Study of Visually Grounded Supervision in Vision-and-Language Transformers0
A Targeted Assessment of Incremental Processing in Neural Language Models and Humans0
AStarTwice at SemEval-2021 Task 5: Toxic Span Detection Using RoBERTa-CRF, Domain Specific Pre-Training and Self-Training0
Entity at SemEval-2021 Task 5: Weakly Supervised Token Labelling for Toxic Spans DetectionCode0
CLaC-BP at SemEval-2021 Task 8: SciBERT Plus Rules for MeasEval0
GlossReader at SemEval-2021 Task 2: Reading Definitions Improves Contextualized Word Embeddings0
Cambridge at SemEval-2021 Task 2: Neural WiC-Model with Data Augmentation and Exploration of Representation0
Noobs at Semeval-2021 Task 4: Masked Language Modeling for abstract answer prediction0
SkoltechNLP at SemEval-2021 Task 2: Generating Cross-Lingual Training Data for the Word-in-Context Task0
S-NLP at SemEval-2021 Task 5: An Analysis of Dual Networks for Sequence Tagging0
MedAI at SemEval-2021 Task 5: Start-to-end Tagging Framework for Toxic Spans Detection0
PINGAN Omini-Sinitic at SemEval-2021 Task 4:Reading Comprehension of Abstract Meaning0
RoMa at SemEval-2021 Task 7: A Transformer-based Approach for Detecting and Rating Humor and Offense0
Realised Volatility Forecasting: Machine Learning via Financial Word Embedding0
Gates Are Not What You Need in RNNsCode0
Towards Continual Entity Learning in Language Models for Conversational Agents0
Structural Guidance for Transformer Language ModelsCode1
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language ProcessingCode1
MWP-BERT: Numeracy-Augmented Pre-training for Math Word Problem SolvingCode1
Goal-Oriented Script ConstructionCode0
Combining Probabilistic Logic and Deep Learning for Self-Supervised Learning0
Show:102550
← PrevPage 264 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified