SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 1155111600 of 17610 papers

TitleStatusHype
An Empirical Study on Pseudo-log-likelihood Bias Measures for Masked Language Models Using Paraphrased Sentences0
Unsupervised Paraphrasability Prediction for Compound Nominalizations0
Zuo Zhuan Ancient Chinese Dataset for Word Sense Disambiguation0
TUG-CIC at SemEval-2021 Task 6: Two-stage Fine-tuning for Intended Sarcasm Detection0
You Don’t Know My Favorite Color: Preventing Dialogue Representations from Revealing Speakers’ Private PersonasCode0
Uncertainty and Inclusivity in Gender Bias Annotation: An Annotation Taxonomy and Annotated Datasets of British English Text0
ValCAT: Variable-Length Contextualized Adversarial Transformations Using Encoder-Decoder Language ModelCode0
Masking Morphosyntactic Categories to Evaluate Salience for Schizophrenia Diagnosis0
An Annotated Dataset and Automatic Approaches for Discourse Mode Identification in Low-resource Bengali Language0
AnaLog: Testing Analytical and Deductive Logic Learnability in Language Models0
Forecasting Future World Events with Neural NetworksCode1
BigBIO: A Framework for Data-Centric Biomedical Natural Language ProcessingCode2
"Diversity and Uncertainty in Moderation" are the Key to Data Selection for Multilingual Few-shot Transfer0
Language model compression with weighted low-rank factorization0
Two-Stage Classifier for COVID-19 Misinformation Detection Using BERT: a Study on Indonesian TweetsCode0
esCorpius: A Massive Spanish Crawling Corpus0
GSCLIP : A Framework for Explaining Distribution Shifts in Natural Language0
Improving Visual Grounding by Encouraging Consistent Gradient-based ExplanationsCode1
GPTs at Factify 2022: Prompt Aided Fact-Verification0
Improving Deliberation by Text-Only and Semi-Supervised Training0
Contextual Density Ratio for Language Model Biasing of Sequence to Sequence ASR Systems0
Towards a Data-Driven Requirements Engineering Approach: Automatic Analysis of User ReviewsCode0
Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody0
Solving Quantitative Reasoning Problems with Language ModelsCode2
Knowledge Distillation of Transformer-based Language Models Revisited0
Adaptive Multi-view Rule Discovery for Weakly-Supervised Compatible Products Prediction0
CC-Riddle: A Question Answering Dataset of Chinese Character RiddlesCode1
Few-Shot Fine-Grained Entity Typing with Automatic Label Interpretation and Instance GenerationCode1
Long Range Language Modeling via Gated State SpacesCode0
Bi-VLDoc: Bidirectional Vision-Language Modeling for Visually-Rich Document Understanding0
A Zero-Shot Classification Approach for a Word-Guessing Challenge0
Your Autoregressive Generative Model Can be Better If You Treat It as an Energy-Based One0
Protoformer: Embedding Prototypes for TransformersCode1
TEVR: Improving Speech Recognition by Token Entropy Variance ReductionCode2
Construct a Sentence with Multiple Specified Words0
Distilling a Pretrained Language Model to a Multilingual ASR ModelCode1
Evaluating Generative Patent Language Models0
Mining Error Templates for Grammatical Error CorrectionCode2
Self-Supervised Learning of Brain Dynamics from Broad Neuroimaging DataCode1
Efficient and effective training of language and graph neural network models0
Revisiting Group Differences in High-Dimensional Choices: Method and Application to Congressional Speech0
DP-Parse: Finding Word Boundaries from Raw Speech with an Instance Lexicon0
GODEL: Large-Scale Pre-Training for Goal-Directed DialogCode2
Using cognitive psychology to understand GPT-30
Knowledge Graph Fusion for Language Model Fine-tuning0
Questions Are All You Need to Train a Dense Passage RetrieverCode1
Don't Forget About Pronouns: Removing Gender Bias in Language Models Without Losing Factual Gender Information0
BenchCLAMP: A Benchmark for Evaluating Language Models on Syntactic and Semantic ParsingCode1
General Framework for Reversible Data Hiding in Texts Based on Masked Language Modeling0
KnowDA: All-in-One Knowledge Mixture Model for Data Augmentation in Low-Resource NLP0
Show:102550
← PrevPage 232 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified