Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 9901–9950 of 17610 papers

Title	Date	Tasks	Status
OmniV-Med: Scaling Medical Vision-Language Model for Universal Visual Understanding	Apr 20, 2025	Language ModelingLanguage Modelling	—Unverified
OMoS-QA: A Dataset for Cross-Lingual Extractive Question Answering in a German Migration Context	Jul 22, 2024	Extractive Question-AnsweringLanguage Modelling	—Unverified
On a Benefit of Masked Language Model Pretraining: Robustness to Simplicity Bias	Nov 16, 2021	Language ModelingLanguage Modelling	—Unverified
On a Benefit of Mask Language Modeling: Robustness to Simplicity Bias	Oct 11, 2021	Hate Speech DetectionLanguage Modeling	—Unverified
On Accurate Evaluation of GANs for Language Generation	Jun 13, 2018	DiversityLanguage Modeling	—Unverified
On Adversarial Examples for Biomedical NLP Tasks	Apr 23, 2020	Language ModelingLanguage Modelling	—Unverified
Automatic Text Extractive Summarization Based on Graph and Pre-trained Language Model Attention	Oct 10, 2021	Extractive SummarizationLanguage Modeling	—Unverified
Once-Tuning-Multiple-Variants: Tuning Once and Expanded as Multiple Vision-Language Model Variants	Jan 1, 2025	Language ModelingLanguage Modelling	—Unverified
OncoGPT: A Medical Conversational Model Tailored with Oncology Domain Expertise on a Large Language Model Meta-AI (LLaMA)	Feb 26, 2024	Language ModelingLanguage Modelling	—Unverified
On Conditional and Compositional Language Model Differentiable Prompting	Jul 4, 2023	Few-Shot LearningLanguage Modeling	—Unverified
On Construction of the ASR-oriented Indian English Pronunciation Dictionary	May 1, 2020	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
On decoder-only architecture for speech-to-text and large language model integration	Jul 8, 2023	DecoderLanguage Modeling	—Unverified
On-Demand Distributional Semantic Distance and Paraphrasing	Jun 1, 2012	Document SummarizationInformation Retrieval	—Unverified
OneDiff: A Generalist Model for Image Difference Captioning	Jul 8, 2024	Language Modellingmodel	—Unverified
One Epoch Is All You Need	Jun 16, 2019	AllLanguage Modelling	—Unverified
One In A Hundred: Select The Best Predicted Sequence from Numerous Candidates for Streaming Speech Recognition	Oct 28, 2020	DecoderDiversity	—Unverified
On Elastic Language Models	Nov 13, 2023	Information RetrievalKnowledge Distillation	—Unverified
One Model for All: Large Language Models are Domain-Agnostic Recommendation Systems	Oct 22, 2023	AllLanguage Modeling	—Unverified
SkillNet-NLU: A Sparsely Activated Model for General-Purpose Natural Language Understanding	Mar 7, 2022	Language ModellingMasked Language Modeling	—Unverified
On Enhancing Root Cause Analysis with SQL Summaries for Failures in Database Workload Replays at SAP HANA	Dec 18, 2024	Language ModelingLanguage Modelling	—Unverified
One-shot Imitation in a Non-Stationary Environment via Multi-Modal Skill	Feb 13, 2024	DiversityImitation Learning	—Unverified
One Teacher is Enough? Pre-trained Language Model Distillation from Multiple Teachers	Jun 2, 2021	Knowledge DistillationLanguage Modeling	—Unverified
ONE: Toward ONE model, ONE algorithm, ONE corpus dedicated to sentiment analysis of Arabic/Arabizi and its dialects	Apr 1, 2021	Language ModelingLanguage Modelling	—Unverified
One-vs-Each Approximation to Softmax for Scalable Estimation of Probabilities	Dec 1, 2016	General ClassificationLanguage Modeling	—Unverified
One-vs-Each Approximation to Softmax for Scalable Estimation of Probabilities	Sep 23, 2016	General ClassificationLanguage Modeling	—Unverified
On Fairness of Unified Multimodal Large Language Model for Image Generation	Feb 5, 2025	FairnessImage Generation	—Unverified
On Functional Activations in Deep Neural Networks	Nov 17, 2023	Language ModellingLarge Language Model	—Unverified
On Improving Deep Learning Trace Analysis with System Call Arguments	Mar 11, 2021	Deep LearningLanguage Modelling	—Unverified
On Improving Informativity and Grammaticality for Multi-Sentence Compression	May 7, 2016	Language ModelingLanguage Modelling	—Unverified
On Language Model Integration for RNN Transducer based Speech Recognition	Oct 13, 2021	Language ModelingLanguage Modelling	—Unverified
On Languaging a Simulation Engine	Feb 26, 2024	DiversityLanguage Modeling	—Unverified
On Learning Better Embeddings from Chinese Clinical Records: Study on Combining In-Domain and Out-Domain Data	Jul 1, 2018	Disease PredictionInformation Retrieval	—Unverified
On Learning Universal Representations Across Languages	Jul 31, 2020	Contrastive LearningCross-Lingual Natural Language Inference	—Unverified
Online Infix Probability Computation for Probabilistic Finite Automata	Jul 1, 2019	Language ModelingLanguage Modelling	—Unverified
Online optimisation of log-linear weights in interactive machine translation	May 1, 2014	Language ModellingMachine Translation	—Unverified
Online Representation Learning in Recurrent Neural Language Models	Aug 16, 2015	Language ModellingRepresentation Learning	—Unverified
On Machine Learning Approaches for Protein-Ligand Binding Affinity Prediction	Jul 15, 2024	Active LearningBenchmarking	—Unverified
On Measuring Social Biases in Prompt-Based Learning	Jan 16, 2022	FormLanguage Modelling	—Unverified
On Mechanistic Circuits for Extractive Question-Answering	Feb 12, 2025	Extractive Question-AnsweringLanguage Modeling	—Unverified
On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer	Oct 23, 2020	Language ModelingLanguage Modelling	—Unverified
On Modeling Sense Relatedness in Multi-prototype Word Embedding	Nov 1, 2017	ClusteringLanguage Modeling	—Unverified
On Modular Training of Neural Acoustics-to-Word Model for LVCSR	Mar 3, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
On Multilingual Encoder Language Model Compression for Low-Resource Languages	May 22, 2025	Knowledge DistillationLanguage Modeling	—Unverified
On Multiplicative Integration with Recurrent Neural Networks	Jun 21, 2016	Language Modelling	—Unverified
On Overcoming Miscalibrated Conversational Priors in LLM-based Chatbots	Jun 1, 2024	Language ModelingLanguage Modelling	—Unverified
On Privacy and Confidentiality of Communications in Organizational Graphs	May 27, 2021	Language Modelling	—Unverified
On Randomized Classification Layers and Their Implications in Natural Language Generation	Jun 1, 2021	Image CaptioningLanguage Modeling	—Unverified
On Reducing Repetition in Abstractive Summarization	Sep 1, 2021	Abstractive Text SummarizationInformativeness	—Unverified
On Retrieval Augmentation and the Limitations of Language Model Training	Nov 16, 2023	Language ModelingLanguage Modelling	—Unverified
On Reward Maximization and Distribution Matching for Fine-Tuning Language Models	Sep 29, 2021	Language ModellingReinforcement Learning (RL)	—Unverified

Show:10 25 50

← PrevPage 199 of 353Next →

All datasets WikiText-103 Penn Treebank (Word Level)enwik8 The Pile WikiText-2 LAMBADA One Billion Word Text8 Penn Treebank (Character Level)Hutter Prize OpenWebText SALMon

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Decay RNN	Validation perplexity	76.67	—	Unverified
2	GRU	Validation perplexity	53.78	—	Unverified
3	LSTM	Validation perplexity	52.73	—	Unverified
4	LSTM	Test perplexity	48.7	—	Unverified
5	Temporal CNN	Test perplexity	45.2	—	Unverified
6	TCN	Test perplexity	45.19	—	Unverified
7	GCNN-8	Test perplexity	44.9	—	Unverified
8	Neural cache model (size = 100)	Test perplexity	44.8	—	Unverified
9	Neural cache model (size = 2,000)	Test perplexity	40.8	—	Unverified
10	GPT-2 Small	Test perplexity	37.5	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TCN	Test perplexity	108.47	—	Unverified
2	Seq-U-Net	Test perplexity	107.95	—	Unverified
3	GRU (Bai et al., 2018)	Test perplexity	92.48	—	Unverified
4	R-Transformer	Test perplexity	84.38	—	Unverified
5	Zaremba et al. (2014) - LSTM (medium)	Test perplexity	82.7	—	Unverified
6	Gal & Ghahramani (2016) - Variational LSTM (medium)	Test perplexity	79.7	—	Unverified
7	LSTM (Bai et al., 2018)	Test perplexity	78.93	—	Unverified
8	Zaremba et al. (2014) - LSTM (large)	Test perplexity	78.4	—	Unverified
9	Gal & Ghahramani (2016) - Variational LSTM (large)	Test perplexity	75.2	—	Unverified
10	Inan et al. (2016) - Variational RHN	Test perplexity	66	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSTM (7 layers)	Bit per Character (BPC)	1.67	—	Unverified
2	Hypernetworks	Bit per Character (BPC)	1.34	—	Unverified
3	SHA-LSTM (4 layers, h=1024, no attention head)	Bit per Character (BPC)	1.33	—	Unverified
4	LN HM-LSTM	Bit per Character (BPC)	1.32	—	Unverified
5	ByteNet	Bit per Character (BPC)	1.31	—	Unverified
6	Recurrent Highway Networks	Bit per Character (BPC)	1.27	—	Unverified
7	Large FS-LSTM-4	Bit per Character (BPC)	1.25	—	Unverified
8	Large mLSTM	Bit per Character (BPC)	1.24	—	Unverified
9	AWD-LSTM (3 layers)	Bit per Character (BPC)	1.23	—	Unverified
10	Cluster-Former (#C=512)	Bit per Character (BPC)	1.22	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Smaller Transformer 126M (pre-trained)	Test perplexity	33	—	Unverified
2	OPT 125M	Test perplexity	32.26	—	Unverified
3	Larger Transformer 771M (pre-trained)	Test perplexity	28.1	—	Unverified
4	OPT 1.3B	Test perplexity	19.55	—	Unverified
5	GPT-Neo 125M	Test perplexity	17.83	—	Unverified
6	OPT 2.7B	Test perplexity	17.81	—	Unverified
7	Smaller Transformer 126M (fine-tuned)	Test perplexity	12	—	Unverified
8	GPT-Neo 1.3B	Test perplexity	11.46	—	Unverified
9	Transformer 125M	Test perplexity	10.7	—	Unverified
10	GPT-Neo 2.7B	Test perplexity	10.44	—	Unverified