SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 1750117550 of 17610 papers

TitleStatusHype
Faithful and Plausible Natural Language Explanations for Image Classification: A Pipeline ApproachCode0
Automated Coding of Under-Studied Medical Concept Domains: Linking Physical Activity Reports to the International Classification of Functioning, Disability, and HealthCode0
A City of Millions: Mapping Literary Social Networks At ScaleCode0
FakeClaim: A Multiple Platform-driven Dataset for Identification of Fake News on 2023 Israel-Hamas WarCode0
Information Guided Regularization for Fine-tuning Language ModelsCode0
Fake news detection using Deep LearningCode0
AUTOMATED AUDIO CAPTIONING BY FINE-TUNING BART WITH AUDIOSET TAGSCode0
Autoencoding Undirected Molecular Graphs With Neural NetworksCode0
AnchiBERT: A Pre-Trained Model for Ancient ChineseLanguage Understanding and GenerationCode0
Falcon 2.0: An Entity and Relation Linking Tool over WikidataCode0
Autoencoding Pixies: Amortised Variational Inference with Graph Convolutions for Functional Distributional SemanticsCode0
HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMsCode0
Autoencoders as Tools for Program SynthesisCode0
Comparing Template-based and Template-free Language Model ProbingCode0
Comparing Optimization Targets for Contrast-Consistent SearchCode0
Information-Restricted Neural Language Models Reveal Different Brain Regions' Sensitivity to Semantics, Syntax and ContextCode0
Community Needs and Assets: A Computational Analysis of Community ConversationsCode0
Fantastic Semantics and Where to Find Them: Investigating Which Layers of Generative LLMs Reflect Lexical SemanticsCode0
Efficient Attention via Pre-Scoring: Prioritizing Informative Keys in TransformersCode0
Farewell to Aimless Large-scale Pretraining: Influential Subset Selection for Language ModelCode0
Anatomy of Neural Language ModelsCode0
Adapting Multi-modal Large Language Model to Concept Drift From Pre-training OnwardsCode0
An Approach for Text Steganography Based on Markov ChainsCode0
GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative ModelsCode0
MiniDisc: Minimal Distillation Schedule for Language Model CompressionCode0
Specializing Unsupervised Pretraining Models for Word-Level Semantic SimilarityCode0
FASPell: A Fast, Adaptable, Simple, Powerful Chinese Spell Checker Based On DAE-Decoder ParadigmCode0
I-BERT: Inductive Generalization of Transformer to Arbitrary Context LengthsCode0
Gradient-based learning applied to document recognitionCode0
Commonsense Knowledge Mining from Pretrained ModelsCode0
Common-Knowledge Concept Recognition for SEVACode0
Adapting Multilingual LLMs to Low-Resource Languages with Knowledge Graphs via AdaptersCode0
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language ModelsCode0
An Analysis of Neural Language Modeling at Multiple ScalesCode0
Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech ProcessingCode0
Gradual Learning of Recurrent Neural NetworksCode0
Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution BehaviorsCode0
GRAMMAR: Grounded and Modular Methodology for Assessment of Closed-Domain Retrieval-Augmented Language ModelCode0
Grammar Induction with Neural Language Models: An Unusual ReplicationCode0
Combining inherent knowledge of vision-language models with unsupervised domain adaptation through strong-weak guidanceCode0
Improving Retrieval Augmented Open-Domain Question-Answering with Vectorized ContextsCode0
Analyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech RecognitionCode0
Iceberg: Enhancing HLS Modeling with Synthetic DataCode0
Combining Analogy with Language Models for Knowledge ExtractionCode0
Improving the Data-efficiency of Reinforcement Learning by Warm-starting with LLMCode0
Adapting Learned Sparse Retrieval for Long DocumentsCode0
AdapterEM: Pre-trained Language Model Adaptation for Generalized Entity Matching using Adapter-tuningCode0
AutoCast++: Enhancing World Event Prediction with Zero-shot Ranking-based Context RetrievalCode0
Combiner: Full Attention Transformer with Sparse Computation CostCode0
Lisbon Computational Linguists at SemEval-2024 Task 2: Using A Mistral 7B Model and Data AugmentationCode0
Show:102550
← PrevPage 351 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified