SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 88518900 of 17610 papers

TitleStatusHype
Language Modelling Approaches to Adaptive Machine Translation0
Language Modelling as a Multi-Task Problem0
Language Modelling for Speaker Diarization in Telephonic Interviews0
Language modelling techniques for analysing the impact of human genetic variation0
Language Modelling via Learning to Rank0
Language Modelling with NMT Query Translation for Amharic-Arabic Cross-Language Information Retrieval0
Language Model Meets Prototypes: Towards Interpretable Text Classification Models through Prototypical Networks0
Language Model Metrics and Procrustes Analysis for Improved Vector Transformation of NLP Embeddings0
Language Model Personalization via Reward Factorization0
Language Model Pretraining and Transfer Learning for Very Low Resource Languages0
Language Model Pre-training for Hierarchical Document Representations0
Language Model Pre-training Improves Generalization in Policy Learning0
Language Model Pre-training on True Negatives0
Language Model Priming for Cross-Lingual Event Extraction0
Language Model Prompt Selection via Simulation Optimization0
Language Model Re-rankers are Steered by Lexical Similarities0
Language Model Rest Costs and Space-Efficient Storage0
Language Models: A Guide for the Perplexed0
Language models and brains align due to more than next-word prediction and word-level information0
Language Models are General-Purpose Interfaces0
Language Models are Good Translators0
Large Language Models are not Models of Natural Language: they are Corpus Models0
Language Models are Symbolic Learners in Arithmetic0
Language models are weak learners0
Language Models as a Knowledge Source for Cognitive Agents0
Language Models as Emotional Classifiers for Textual Conversations0
Language Models as Fact Checkers?0
Language Models can be Logical Solvers0
Language Model Self-improvement by Reinforcement Learning Contemplation0
Language Models for Cloze Task Answer Generation in Russian0
Language Models for Image Captioning: The Quirks and What Works0
Language Models for Machine Translation: Original vs. Translated Texts0
Language Models for Novelty Detection in System Call Traces0
Language Models Get a Gender Makeover: Mitigating Gender Bias with Few-Shot Data Interventions0
Language Models, Graph Searching, and Supervision Adulteration: When More Supervision is Less and How to Make More More0
Language Models "Grok" to Copy0
Language models in molecular discovery0
Language Models Learn POS First0
Language models of protein sequences at the scale of evolution enable accurate structure prediction0
Language Models of Spoken Dutch0
Language Models sounds the Death Knell of Knowledge Graphs0
Language Models that Seek for Knowledge: Modular Search & Generation for Dialogue and Prompt Completion0
Language Model Supervision for Handwriting Recognition Model Adaptation0
Language Models Use Trigonometry to Do Addition0
Language Models with Conformal Factuality Guarantees0
Language Production Dynamics with Recurrent Neural Networks0
Language-Queried Target Sound Extraction Without Parallel Training Data0
Language Rectified Flow: Advancing Diffusion Language Generation with Probabilistic Flows0
LanguageRefer: Spatial-Language Model for 3D Visual Grounding0
Language Resources for Dutch Large Language Modelling0
Show:102550
← PrevPage 178 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified