SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 50515100 of 17610 papers

TitleStatusHype
DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models0
Dick-Preston and Morbo at SemEval-2019 Task 4: Transfer Learning for Hyperpartisan News Detection0
DictBERT: Dictionary Description Knowledge Enhanced Language Model Pre-training via Contrastive Learning0
DictFormer: Tiny Transformer with Shared Dictionary0
DICT-MLM: Improved Multilingual Pre-Training using Bilingual Dictionaries0
DIDS: Domain Impact-aware Data Sampling for Large Language Model Training0
DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models0
Differentiable Neural Architecture Search with Morphism-based Transformable Backbone Architectures0
Differentiable Retrieval Augmentation via Generative Language Modeling for E-commerce Query Intent Classification0
Differentiable Window for Dynamic Local Attention0
Differentially Private Decoding in Large Language Models0
Differentially Private Distributed Learning for Language Modeling Tasks0
Differentially Private Language Models Benefit from Public Pre-training0
Differentially Private Language Models for Secure Data Sharing0
Differentially Private Low-Rank Adaptation of Large Language Model Using Federated Learning0
Differentially Private Meta-Learning0
Differentially Private Zeroth-Order Methods for Scalable Large Language Model Finetuning0
Different Strokes for Different Folks: Investigating Appropriate Further Pre-training Approaches for Diverse Dialogue Tasks0
Different Tokenization Schemes Lead to Comparable Performance in Spanish Number Agreement0
DiffGen: Robot Demonstration Generation via Differentiable Physics Simulation, Differentiable Rendering, and Vision-Language Model0
Difficulty-Focused Contrastive Learning for Knowledge Tracing with a Large Language Model-Based Difficulty Prediction0
Diff-Restorer: Unleashing Visual Prompts for Diffusion-based Universal Image Restoration0
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation0
DiffusEmp: A Diffusion Model-Based Framework with Multi-Grained Control for Empathetic Response Generation0
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning0
Diffusion based Text-to-Music Generation with Global and Local Text based Conditioning0
DiffusionBlocks: Blockwise Training for Generative Models via Score-Based Diffusion0
Diffusion Models for Open-Vocabulary Segmentation0
Diffusion on language model encodings for protein sequence generation0
Diffusion Self-Distillation for Zero-Shot Customized Image Generation0
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective0
DiffVLA: Vision-Language Guided Diffusion Planning for Autonomous Driving0
Diformer: Directional Transformer for Neural Machine Translation0
Digger: Detecting Copyright Content Mis-usage in Large Language Model Training0
Digital Avatars: Framework Development and Their Evaluation0
Digital Business Model Analysis Using a Large Language Model0
Digital Twin Buildings: 3D Modeling, GIS Integration, and Visual Descriptions Using Gaussian Splatting, ChatGPT/Deepseek, and Google Maps Platform0
DiMBERT: Learning Vision-Language Grounded Representations with Disentangled Multimodal-Attention0
DINT Transformer0
DiPaCo: Distributed Path Composition0
Dipper: Diversity in Prompts for Producing Large Language Model Ensembles in Reasoning tasks0
DiPT: Enhancing LLM reasoning through diversified perspective-taking0
dIR -- Discrete Information Retrieval: Conversational Search over Unstructured (and Structured) Data with Large Language Models0
Direct Acoustics-to-Word Models for English Conversational Speech Recognition0
Direct Fact Retrieval from Knowledge Graphs without Entity Linking0
Direct Language Model Alignment from Online AI Feedback0
DIRECTOR: Generator-Classifiers For Supervised Language Modeling0
DirectorLLM for Human-Centric Video Generation0
DIRI: Adversarial Patient Reidentification with Large Language Models for Evaluating Clinical Text Anonymization0
Disaggregating Hops: Can We Guide a Multi-Hop Reasoning Language Model to Incrementally Learn at each Hop?0
Show:102550
← PrevPage 102 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified