Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 15151–15200 of 17610 papers

Title	Date	Tasks	Status
Promoting Open-domain Dialogue Generation through Learning Pattern Information between Contexts and Responses	Sep 6, 2023	Dialogue GenerationLanguage Modelling	CodeCode Available
LLM-RankFusion: Mitigating Intrinsic Inconsistency in LLM-based Ranking	May 31, 2024	In-Context LearningInformation Retrieval	CodeCode Available
Runaway is Ashamed, But Helpful: On the Early-Exit Behavior of Large Language Model-based Agents in Embodied Environments	May 23, 2025	Language ModelingLanguage Modelling	CodeCode Available
Toward a Thermodynamics of Meaning	Sep 24, 2020	Language ModelingLanguage Modelling	CodeCode Available
RuOpinionNE-2024: Extraction of Opinion Tuples from Russian News Texts	Apr 9, 2025	Dialogue EvaluationLanguage Modeling	CodeCode Available
Russian Language Datasets in the Digitial Humanities Domain and Their Evaluation with Word Embeddings	Mar 4, 2019	Language ModelingLanguage Modelling	CodeCode Available
Russian Natural Language Generation: Creation of a Language Modelling Dataset and Evaluation with Modern Neural Architectures	May 5, 2020	DiversityLanguage Modeling	CodeCode Available
Promoting Exploration in Memory-Augmented Adam using Critical Momenta	Jul 18, 2023	image-classificationImage Classification	CodeCode Available
The Distributional Hypothesis Does Not Fully Explain the Benefits of Masked Language Model Pretraining	Oct 25, 2023	Language ModelingLanguage Modelling	CodeCode Available
Prometheus Chatbot: Knowledge Graph Collaborative Large Language Model for Computer Components Recommendation	Jul 29, 2024	ChatbotKnowledge Graphs	CodeCode Available
ProMap: Effective Bilingual Lexicon Induction via Language Model Prompting	Oct 28, 2023	Bilingual Lexicon InductionLanguage Modeling	CodeCode Available
Multimodal data matters: language model pre-training over structured and unstructured electronic health records	Jan 25, 2022	Decision MakingLanguage Modeling	CodeCode Available
Project SHADOW: Symbolic Higher-order Associative Deductive reasoning On Wikidata using LM probing	Aug 27, 2024	Knowledge Base ConstructionLanguage Modeling	CodeCode Available
LLM-QFL: Distilling Large Language Model for Quantum Federated Learning	May 24, 2025	Federated LearningLanguage Modeling	CodeCode Available
S2SNet: A Pretrained Neural Network for Superconductivity Discovery	Jun 28, 2023	Electrical EngineeringLanguage Modeling	CodeCode Available
Language Generation via Combinatorial Constraint Satisfaction: A Tree Search Enhanced Monte-Carlo Approach	Nov 24, 2020	Language ModelingLanguage Modelling	CodeCode Available
S3: A Simple Strong Sample-effective Multimodal Dialog System	Jun 26, 2024	Language ModelingLanguage Modelling	CodeCode Available
LASMP: Language Aided Subset Sampling Based Motion Planner	Oct 1, 2024	Language ModelingLanguage Modelling	CodeCode Available
JCSE: Contrastive Learning of Japanese Sentence Embeddings and Its Applications	Jan 19, 2023	Contrastive LearningDomain Adaptation	CodeCode Available
Just ClozE! A Novel Framework for Evaluating the Factual Consistency Faster in Abstractive Summarization	Oct 6, 2022	Abstractive Text SummarizationLanguage Modelling	CodeCode Available
Projective Methods for Mitigating Gender Bias in Pre-trained Language Models	Mar 27, 2024	Language ModelingLanguage Modelling	CodeCode Available
The Effectiveness of Masked Language Modeling and Adapters for Factual Knowledge Injection	Oct 3, 2022	Language ModelingLanguage Modelling	CodeCode Available
Public Attitudes Toward ChatGPT on Twitter: Sentiments, Topics, and Occupations	Jun 22, 2023	ChatbotLanguage Modelling	CodeCode Available
The Effect of Different Writing Tasks on Linguistic Style: A Case Study of the ROC Story Cloze Task	Feb 7, 2017	Language ModelingLanguage Modelling	CodeCode Available
Story Ending Prediction by Transferable BERT	May 17, 2019	Language ModelingLanguage Modelling	CodeCode Available
More Room for Language: Investigating the Effect of Retrieval on Language Models	Apr 16, 2024	Language ModelingLanguage Modelling	CodeCode Available
StrassenNets: Deep Learning with a Multiplication Budget	Dec 11, 2017	Deep Learningimage-classification	CodeCode Available
Progressive Distillation Based on Masked Generation Feature Method for Knowledge Graph Completion	Jan 19, 2024	Knowledge Graph CompletionLanguage Modelling	CodeCode Available
Progressive Class Semantic Matching for Semi-supervised Text Classification	May 20, 2022	General ClassificationLanguage Modeling	CodeCode Available
Profiling Patient Transcript Using Large Language Model Reasoning Augmentation for Alzheimer's Disease Detection	Sep 19, 2024	Alzheimer's Disease DetectionLanguage Modeling	CodeCode Available
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety	Apr 8, 2024	Language ModelingLanguage Modelling	CodeCode Available
The effect of fine-tuning on language model toxicity	Oct 21, 2024	Language ModelingLanguage Modelling	CodeCode Available
Professor Forcing: A New Algorithm for Training Recurrent Networks	Oct 27, 2016	Domain AdaptationHandwriting generation	CodeCode Available
LLMPC: Large Language Model Predictive Control	Jan 5, 2025	Language ModelingLanguage Modelling	CodeCode Available
Learning to Maximize Mutual Information for Chain-of-Thought Distillation	Mar 5, 2024	Knowledge DistillationLanguage Modeling	CodeCode Available
More RLHF, More Trust? On The Impact of Preference Alignment On Trustworthiness	Apr 29, 2024	EthicsLanguage Modelling	CodeCode Available
Product Information Extraction using ChatGPT	Jun 23, 2023	AttributeLanguage Modeling	CodeCode Available
Procedural Dilemma Generation for Evaluating Moral Reasoning in Humans and Language Models	Apr 17, 2024	Decision MakingLanguage Modelling	CodeCode Available
Learning to Locate Visual Answer in Video Corpus Using Question	Oct 11, 2022	Contrastive LearningLanguage Modelling	CodeCode Available
Streaming Joint Speech Recognition and Disfluency Detection	Nov 16, 2022	DecoderLanguage Modelling	CodeCode Available
The Effects of In-domain Corpus Size on pre-training BERT	Dec 15, 2022	Language ModelingLanguage Modelling	CodeCode Available
MoRE-LLM: Mixture of Rule Experts Guided by a Large Language Model	Mar 26, 2025	Language ModelingLanguage Modelling	CodeCode Available
SALM: A Multi-Agent Framework for Language Model-Driven Social Network Simulation	May 14, 2025	Language ModelingLanguage Modelling	CodeCode Available
Problem-Solving in Language Model Networks	Jun 18, 2024	Language ModelingLanguage Modelling	CodeCode Available
Probing the Robustness Properties of Neural Speech Codecs	May 30, 2025	Language ModelingLanguage Modelling	CodeCode Available
LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot Systems	Sep 12, 2024	Language ModelingLanguage Modelling	CodeCode Available
More Expressive Attention with Negative Weights	Nov 11, 2024	DecoderImage Generation	CodeCode Available
The emergence of number and syntax units in LSTM language models	Mar 18, 2019	Language ModelingLanguage Modelling	CodeCode Available
Probing the Capacity of Language Model Agents to Operationalize Disparate Experiential Context Despite Distraction	Nov 19, 2024	Language ModelingLanguage Modelling	CodeCode Available
Probing Simile Knowledge from Pre-trained Language Models	Apr 27, 2022	DiversityLanguage Modelling	CodeCode Available

Show:10 25 50

← PrevPage 304 of 353Next →

All datasets WikiText-103 Penn Treebank (Word Level)enwik8 The Pile WikiText-2 LAMBADA One Billion Word Text8 Penn Treebank (Character Level)Hutter Prize OpenWebText SALMon

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Decay RNN	Validation perplexity	76.67	—	Unverified
2	GRU	Validation perplexity	53.78	—	Unverified
3	LSTM	Validation perplexity	52.73	—	Unverified
4	LSTM	Test perplexity	48.7	—	Unverified
5	Temporal CNN	Test perplexity	45.2	—	Unverified
6	TCN	Test perplexity	45.19	—	Unverified
7	GCNN-8	Test perplexity	44.9	—	Unverified
8	Neural cache model (size = 100)	Test perplexity	44.8	—	Unverified
9	Neural cache model (size = 2,000)	Test perplexity	40.8	—	Unverified
10	GPT-2 Small	Test perplexity	37.5	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TCN	Test perplexity	108.47	—	Unverified
2	Seq-U-Net	Test perplexity	107.95	—	Unverified
3	GRU (Bai et al., 2018)	Test perplexity	92.48	—	Unverified
4	R-Transformer	Test perplexity	84.38	—	Unverified
5	Zaremba et al. (2014) - LSTM (medium)	Test perplexity	82.7	—	Unverified
6	Gal & Ghahramani (2016) - Variational LSTM (medium)	Test perplexity	79.7	—	Unverified
7	LSTM (Bai et al., 2018)	Test perplexity	78.93	—	Unverified
8	Zaremba et al. (2014) - LSTM (large)	Test perplexity	78.4	—	Unverified
9	Gal & Ghahramani (2016) - Variational LSTM (large)	Test perplexity	75.2	—	Unverified
10	Inan et al. (2016) - Variational RHN	Test perplexity	66	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSTM (7 layers)	Bit per Character (BPC)	1.67	—	Unverified
2	Hypernetworks	Bit per Character (BPC)	1.34	—	Unverified
3	SHA-LSTM (4 layers, h=1024, no attention head)	Bit per Character (BPC)	1.33	—	Unverified
4	LN HM-LSTM	Bit per Character (BPC)	1.32	—	Unverified
5	ByteNet	Bit per Character (BPC)	1.31	—	Unverified
6	Recurrent Highway Networks	Bit per Character (BPC)	1.27	—	Unverified
7	Large FS-LSTM-4	Bit per Character (BPC)	1.25	—	Unverified
8	Large mLSTM	Bit per Character (BPC)	1.24	—	Unverified
9	AWD-LSTM (3 layers)	Bit per Character (BPC)	1.23	—	Unverified
10	Cluster-Former (#C=512)	Bit per Character (BPC)	1.22	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Smaller Transformer 126M (pre-trained)	Test perplexity	33	—	Unverified
2	OPT 125M	Test perplexity	32.26	—	Unverified
3	Larger Transformer 771M (pre-trained)	Test perplexity	28.1	—	Unverified
4	OPT 1.3B	Test perplexity	19.55	—	Unverified
5	GPT-Neo 125M	Test perplexity	17.83	—	Unverified
6	OPT 2.7B	Test perplexity	17.81	—	Unverified
7	Smaller Transformer 126M (fine-tuned)	Test perplexity	12	—	Unverified
8	GPT-Neo 1.3B	Test perplexity	11.46	—	Unverified
9	Transformer 125M	Test perplexity	10.7	—	Unverified
10	GPT-Neo 2.7B	Test perplexity	10.44	—	Unverified