SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 1375113800 of 17610 papers

TitleStatusHype
The University of Edinburgh’s Submission to the IWSLT21 Simultaneous Translation Task0
MedAI at SemEval-2021 Task 5: Start-to-end Tagging Framework for Toxic Spans Detection0
PHMOSpell: Phonological and Morphological Knowledge Guided Chinese Spelling Check0
Probing Multi-modal Machine Translation with Pre-trained Language Model0
Multi-Lingual Question Generation with Language Agnostic Language ModelCode0
PRAL: A Tailored Pre-Training Model for Task-Oriented Dialog Generation0
Text-in-Context: Token-Level Error Detection for Table-to-Text GenerationCode0
nmT5 - Is parallel data still relevant for pre-training massively multilingual language models?0
MVP-BERT: Multi-Vocab Pre-training for Chinese BERT0
S-NLP at SemEval-2021 Task 5: An Analysis of Dual Networks for Sequence Tagging0
KACE: Generating Knowledge Aware Contrastive Explanations for Natural Language Inference0
基于预训练语言模型的繁体古文自动句读研究(Automatic Traditional Ancient Chinese Texts Segmentation and Punctuation Based on Pre-training Language Model)0
Stereotyping Norwegian Salmon: An Inventory of Pitfalls in Fairness Benchmark Datasets0
PINGAN Omini-Sinitic at SemEval-2021 Task 4:Reading Comprehension of Abstract Meaning0
Selecting Informative Contexts Improves Language Model Fine-tuning0
RoMa at SemEval-2021 Task 7: A Transformer-based Approach for Detecting and Rating Humor and Offense0
Team “NoConflict” at CASE 2021 Task 1: Pretraining for Sentence-Level Protest Event Detection0
Realised Volatility Forecasting: Machine Learning via Financial Word Embedding0
Time-Efficient Code Completion Model for the R Programming LanguageCode0
MulDA: A Multilingual Data Augmentation Framework for Low-Resource Cross-Lingual NER0
Meta-Learning for Few-Shot Named Entity Recognition0
Personalized Response Generation with Tensor Factorization0
Product Review Translation: Parallel Corpus Creation and Robustness towards User-generated Noisy Text0
Let’s be explicit about that: Distant supervision for implicit discourse relation classification via connective prediction0
Noobs at Semeval-2021 Task 4: Masked Language Modeling for abstract answer prediction0
Measuring and Improving BERT's Mathematical Abilities by Predicting the Order of Reasoning.0
Using Gender- and Polarity-Informed Models to Investigate Bias0
Unleash GPT-2 Power for Event Detection0
GlossReader at SemEval-2021 Task 2: Reading Definitions Improves Contextualized Word Embeddings0
Attending Self-Attention: A Case Study of Visually Grounded Supervision in Vision-and-Language Transformers0
DeepBlueAI at SemEval-2021 Task 7: Detecting and Rating Humor and Offense with Stacking Diverse Language Model-Based Methods0
Improving Low-Resource Named Entity Recognition via Label-Aware Data Augmentation and Curriculum Denoising0
Document-Grounded Goal-Oriented Dialogue Systems on Pre-Trained Language Model with Diverse Input Representation0
Gates Are Not What You Need in RNNsCode0
Cambridge at SemEval-2021 Task 2: Neural WiC-Model with Data Augmentation and Exploration of Representation0
A Comparison of Sentence-Weighting Techniques for NMT0
AStarTwice at SemEval-2021 Task 5: Toxic Span Detection Using RoBERTa-CRF, Domain Specific Pre-Training and Self-Training0
Entity at SemEval-2021 Task 5: Weakly Supervised Token Labelling for Toxic Spans DetectionCode0
A Pre-training Strategy for Zero-Resource Response Selection in Knowledge-Grounded Conversations0
He is very intelligent, she is very beautiful? On Mitigating Social Biases in Language Modelling and Generation0
CLaC-BP at SemEval-2021 Task 8: SciBERT Plus Rules for MeasEval0
Decoding, Fast and Slow: A Case Study on Balancing Trade-Offs in Incremental, Character-level Pragmatic Reasoning0
Evaluating morphological typology in zero-shot cross-lingual transfer0
AND does not mean OR: Using Formal Languages to Study Language Models' Representations0
IBM MNLP IE at CASE 2021 Task 1: Multigranular and Multilingual Event Detection on Protest News0
Entity and Evidence Guided Document-Level Relation Extraction0
Enhancing Language Generation with Effective Checkpoints of Pre-trained Language Model0
Best of Both Worlds: Making High Accuracy Non-incremental Transformer-based Disfluency Detection Incremental0
A Targeted Assessment of Incremental Processing in Neural Language Models and Humans0
EnsLM: Ensemble Language Model for Data Diversity by Semantic ClusteringCode0
Show:102550
← PrevPage 276 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified