Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 701–750 of 17610 papers

Title	Date	Tasks	Status	Hype
Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs	Oct 10, 2024	Active LearningLanguage Modeling	CodeCode Available	2
Q-VLM: Post-training Quantization for Large Vision-Language Models	Oct 10, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling	Oct 10, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
Sylber: Syllabic Embedding Representation of Speech from Raw Audio	Oct 9, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
Towards Interpreting Visual Information Processing in Vision-Language Models	Oct 9, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
Compositional Entailment Learning for Hyperbolic Vision-Language Models	Oct 9, 2024	Language ModellingRepresentation Learning	CodeCode Available	2
Think While You Generate: Discrete Diffusion with Planned Denoising	Oct 8, 2024	DenoisingImage Generation	CodeCode Available	2
BUMBLE: Unifying Reasoning and Acting with Vision-Language Models for Building-wide Mobile Manipulation	Oct 8, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling	Oct 8, 2024	document understandingLanguage Modeling	CodeCode Available	2
TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokens	Oct 7, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
Differential Transformer	Oct 7, 2024	HallucinationIn-Context Learning	CodeCode Available	2
Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality	Oct 7, 2024	Causal Inferencecounterfactual	CodeCode Available	2
GenSim: A General Social Simulation Platform with Large Language Model based Agents	Oct 6, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
SyllableLM: Learning Coarse Semantic Units for Speech Language Models	Oct 5, 2024	ClusteringLanguage Modeling	CodeCode Available	2
A Simple yet Effective Training-free Prompt-free Approach to Chinese Spelling Correction Based on Large Language Models	Oct 5, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
Autoregressive Action Sequence Learning for Robotic Manipulation	Oct 4, 2024	ChunkingLanguage Modeling	CodeCode Available	2
NNetscape Navigator: Complex Demonstrations for Web Agents Without a Demonstrator	Oct 3, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks	Oct 2, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning	Sep 30, 2024	Instruction FollowingLanguage Modeling	CodeCode Available	2
FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows"	Sep 30, 2024	counterfactualHallucination	CodeCode Available	2
DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data	Sep 30, 2024	Instruction FollowingLanguage Modeling	CodeCode Available	2
LLMEmb: Large Language Model Can Be a Good Embedding Generator for Sequential Recommendation	Sep 30, 2024	AttributeCollaborative Filtering	CodeCode Available	2
One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos	Sep 29, 2024	AllImage Segmentation	CodeCode Available	2
Control Industrial Automation System with Large Language Model Agents	Sep 26, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
Empirical Asset Pricing with Large Language Model Agents	Sep 25, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
Small Language Models: Survey, Measurements, and Insights	Sep 24, 2024	BenchmarkingDecoder	CodeCode Available	2
EEGUnity: Open-Source Tool in Facilitating Unified EEG Datasets Towards Large-Scale EEG Model	Sep 24, 2024	EEGElectroencephalogram (EEG)	CodeCode Available	2
MobileVLM: A Vision-Language Model for Better Intra- and Inter-UI Understanding	Sep 23, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
Diabetica: Adapting Large Language Model to Enhance Multiple Medical Tasks in Diabetes Care and Management	Sep 20, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization	Sep 19, 2024	GPULanguage Modeling	CodeCode Available	2
Iteration of Thought: Leveraging Inner Dialogue for Autonomous Large Language Model Reasoning	Sep 19, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
Towards Interactive and Learnable Cooperative Driving Automation: a Large Language Model-Driven Decision-Making Framework	Sep 19, 2024	Autonomous VehiclesDecision Making	CodeCode Available	2
AutoVerus: Automated Proof Generation for Rust Code	Sep 19, 2024	Code GenerationLanguage Modeling	CodeCode Available	2
LLaQo: Towards a Query-Based Coach in Expressive Music Performance Assessment	Sep 13, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions	Sep 13, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	2
Synthetic continued pretraining	Sep 11, 2024	Data AugmentationLanguage Modelling	CodeCode Available	2
MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving	Sep 11, 2024	Autonomous DrivingFeature Engineering	CodeCode Available	2
DetailCLIP: Detail-Oriented CLIP for Fine-Grained Tasks	Sep 10, 2024	Contrastive LearningImage Reconstruction	CodeCode Available	2
TransformerRanker: A Tool for Efficiently Finding the Best-Suited Language Models for Downstream Classification Tasks	Sep 9, 2024	ClassificationLanguage Modeling	CodeCode Available	2
The AdEMAMix Optimizer: Better, Faster, Older	Sep 5, 2024	image-classificationImage Classification	CodeCode Available	2
Language Model Powered Digital Biology with BRAD	Sep 4, 2024	ChatbotCode Generation	CodeCode Available	2
EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning Performance	Sep 2, 2024	AudioCapsAudio captioning	CodeCode Available	2
Sample-Efficient Diffusion for Text-To-Speech Synthesis	Sep 1, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation	Sep 1, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
MemLong: Memory-Augmented Retrieval for Long Text Modeling	Aug 30, 2024	4kDecoder	CodeCode Available	2
Law of Vision Representation in MLLMs	Aug 29, 2024	cross-modal alignmentLanguage Modeling	CodeCode Available	2
Efficient LLM Scheduling by Learning to Rank	Aug 28, 2024	BlockingChatbot	CodeCode Available	2
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet	Aug 27, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents	Aug 26, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
LLMs as Zero-shot Graph Learners: Alignment of GNN Representations with LLM Token Embeddings	Aug 25, 2024	Language ModellingLink Prediction	CodeCode Available	2

Show:10 25 50

← PrevPage 15 of 353Next →

All datasets WikiText-103 Penn Treebank (Word Level)enwik8 The Pile WikiText-2 LAMBADA One Billion Word Text8 Penn Treebank (Character Level)Hutter Prize OpenWebText SALMon

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Decay RNN	Validation perplexity	76.67	—	Unverified
2	GRU	Validation perplexity	53.78	—	Unverified
3	LSTM	Validation perplexity	52.73	—	Unverified
4	LSTM	Test perplexity	48.7	—	Unverified
5	Temporal CNN	Test perplexity	45.2	—	Unverified
6	TCN	Test perplexity	45.19	—	Unverified
7	GCNN-8	Test perplexity	44.9	—	Unverified
8	Neural cache model (size = 100)	Test perplexity	44.8	—	Unverified
9	Neural cache model (size = 2,000)	Test perplexity	40.8	—	Unverified
10	GPT-2 Small	Test perplexity	37.5	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TCN	Test perplexity	108.47	—	Unverified
2	Seq-U-Net	Test perplexity	107.95	—	Unverified
3	GRU (Bai et al., 2018)	Test perplexity	92.48	—	Unverified
4	R-Transformer	Test perplexity	84.38	—	Unverified
5	Zaremba et al. (2014) - LSTM (medium)	Test perplexity	82.7	—	Unverified
6	Gal & Ghahramani (2016) - Variational LSTM (medium)	Test perplexity	79.7	—	Unverified
7	LSTM (Bai et al., 2018)	Test perplexity	78.93	—	Unverified
8	Zaremba et al. (2014) - LSTM (large)	Test perplexity	78.4	—	Unverified
9	Gal & Ghahramani (2016) - Variational LSTM (large)	Test perplexity	75.2	—	Unverified
10	Inan et al. (2016) - Variational RHN	Test perplexity	66	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSTM (7 layers)	Bit per Character (BPC)	1.67	—	Unverified
2	Hypernetworks	Bit per Character (BPC)	1.34	—	Unverified
3	SHA-LSTM (4 layers, h=1024, no attention head)	Bit per Character (BPC)	1.33	—	Unverified
4	LN HM-LSTM	Bit per Character (BPC)	1.32	—	Unverified
5	ByteNet	Bit per Character (BPC)	1.31	—	Unverified
6	Recurrent Highway Networks	Bit per Character (BPC)	1.27	—	Unverified
7	Large FS-LSTM-4	Bit per Character (BPC)	1.25	—	Unverified
8	Large mLSTM	Bit per Character (BPC)	1.24	—	Unverified
9	AWD-LSTM (3 layers)	Bit per Character (BPC)	1.23	—	Unverified
10	Cluster-Former (#C=512)	Bit per Character (BPC)	1.22	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Smaller Transformer 126M (pre-trained)	Test perplexity	33	—	Unverified
2	OPT 125M	Test perplexity	32.26	—	Unverified
3	Larger Transformer 771M (pre-trained)	Test perplexity	28.1	—	Unverified
4	OPT 1.3B	Test perplexity	19.55	—	Unverified
5	GPT-Neo 125M	Test perplexity	17.83	—	Unverified
6	OPT 2.7B	Test perplexity	17.81	—	Unverified
7	Smaller Transformer 126M (fine-tuned)	Test perplexity	12	—	Unverified
8	GPT-Neo 1.3B	Test perplexity	11.46	—	Unverified
9	Transformer 125M	Test perplexity	10.7	—	Unverified
10	GPT-Neo 2.7B	Test perplexity	10.44	—	Unverified