Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 9751–9800 of 17610 papers

Title	Date	Tasks	Status	Hype
How Predictable Are Large Language Model Capabilities? A Case Study on BIG-bench	May 24, 2023	DiversityLanguage Modeling	CodeCode Available	0
Dynamic Masking Rate Schedules for MLM Pretraining	May 24, 2023	Language ModelingLanguage Modelling	—Unverified	0
Estimating class separability of text embeddings with persistent homology	May 24, 2023	Language ModellingMulti Class Text Classification	—Unverified	0
An Efficient Multilingual Language Model Compression through Vocabulary Trimming	May 24, 2023	Language ModelingLanguage Modelling	CodeCode Available	1
C-STS: Conditional Semantic Textual Similarity	May 24, 2023	Information RetrievalLanguage Model Evaluation	CodeCode Available	1
Calc-X and Calcformers: Empowering Arithmetical Chain-of-Thought through Interaction with Symbolic Systems	May 24, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	0
Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model	May 24, 2023	AllLanguage Modeling	CodeCode Available	0
Trade-Offs Between Fairness and Privacy in Language Modeling	May 24, 2023	Bias DetectionFairness	CodeCode Available	0
Textless Speech-to-Speech Translation With Limited Parallel Data	May 24, 2023	Automatic Speech RecognitionDenoising	CodeCode Available	0
Focus Your Attention (with Adaptive IIR Filters)	May 24, 2023	Language ModellingLong-range modeling	—Unverified	0
Mitigating Test-Time Bias for Fair Image Retrieval	May 23, 2023	Image RetrievalLanguage Modeling	CodeCode Available	0
QLoRA: Efficient Finetuning of Quantized LLMs	May 23, 2023	ChatbotGPU	CodeCode Available	6
Regex-augmented Domain Transfer Topic Classification based on a Pre-trained Language Model: An application in Financial Domain	May 23, 2023	Language ModelingLanguage Modelling	—Unverified	0
Natural Language Decompositions of Implicit Content Enable Better Text Representations	May 23, 2023	Language ModelingLanguage Modelling	CodeCode Available	0
RetICL: Sequential Retrieval of In-Context Examples with Reinforcement Learning	May 23, 2023	In-Context LearningLanguage Modelling	CodeCode Available	1
Language Model Self-improvement by Reinforcement Learning Contemplation	May 23, 2023	Language ModelingLanguage Modelling	—Unverified	0
Parameter-Efficient Language Model Tuning with Active Learning in Low-Resource Settings	May 23, 2023	Active LearningLanguage Modeling	CodeCode Available	0
MathDial: A Dialogue Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems	May 23, 2023	Language ModellingLarge Language Model	CodeCode Available	1
On Robustness of Finetuned Transformer-based NLP Models	May 23, 2023	DecoderLanguage Modelling	CodeCode Available	0
From Characters to Words: Hierarchical Pre-trained Language Model for Open-vocabulary Language Understanding	May 23, 2023	Language ModelingLanguage Modelling	—Unverified	0
Graph Meets LLM: A Novel Approach to Collaborative Filtering for Robust Conversational Understanding	May 23, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Cascaded Beam Search: Plug-and-Play Terminology-Forcing For Neural Machine Translation	May 23, 2023	Language ModelingLanguage Modelling	—Unverified	0
FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models	May 23, 2023	Language ModelingLanguage Modelling	CodeCode Available	1
Exploring Contrast Consistency of Open-Domain Question Answering Systems on Minimally Edited Questions	May 23, 2023	Data AugmentationLanguage Modeling	CodeCode Available	0
Acquiring Frame Element Knowledge with Deep Metric Learning for Semantic Frame Induction	May 23, 2023	ClusteringLanguage Modeling	—Unverified	0
Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks	May 23, 2023	Few-Shot LearningLanguage Modeling	CodeCode Available	0
Faithful and Efficient Explanations for Neural Networks via Neural Tangent Kernel Surrogate Models	May 23, 2023	Data PoisoningLanguage Modelling	CodeCode Available	0
Domain Private Transformers for Multi-Domain Dialog Systems	May 23, 2023	domain classificationLanguage Modeling	CodeCode Available	0
Goal-Driven Explainable Clustering via Language Descriptions	May 23, 2023	ClusteringLanguage Modelling	CodeCode Available	1
Error Detection for Text-to-SQL Semantic Parsing	May 23, 2023	Language ModelingLanguage Modelling	CodeCode Available	0
ConGraT: Self-Supervised Contrastive Pretraining for Joint Graph and Text Embeddings	May 23, 2023	Community DetectionContrastive Learning	CodeCode Available	1
AxomiyaBERTa: A Phonologically-aware Transformer Model for Assamese	May 23, 2023	Language ModelingLanguage Modelling	CodeCode Available	0
APPLS: Evaluating Evaluation Metrics for Plain Language Summarization	May 23, 2023	InformativenessLanguage Modelling	CodeCode Available	0
Enhancing Black-Box Few-Shot Text Classification with Prompt-Based Data Augmentation	May 23, 2023	Data AugmentationFew-Shot Text Classification	—Unverified	0
Aligning Large Language Models through Synthetic Feedback	May 23, 2023	Language ModelingLanguage Modelling	CodeCode Available	1
CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model	May 23, 2023	DecoderLanguage Modeling	CodeCode Available	1
When your Cousin has the Right Connections: Unsupervised Bilingual Lexicon Induction for Related Data-Imbalanced Languages	May 23, 2023	Bilingual Lexicon InductionLanguage Modeling	CodeCode Available	0
Discrete Prompt Optimization via Constrained Generation for Zero-shot Re-ranker	May 23, 2023	Information RetrievalLanguage Modeling	CodeCode Available	0
Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models	May 23, 2023	AllFairness	—Unverified	0
GenSpectrum Chat: Data Exploration in Public Health Using Large Language Models	May 23, 2023	ChatbotLanguage Modelling	—Unverified	0
Query Rewriting for Retrieval-Augmented Large Language Models	May 23, 2023	Language ModelingLanguage Modelling	—Unverified	0
Leveraging Open Information Extraction for More Robust Domain Transfer of Event Trigger Detection	May 23, 2023	Event DetectionLanguage Modeling	CodeCode Available	0
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training	May 23, 2023	Language ModelingLanguage Modelling	CodeCode Available	2
Learning from Mistakes via Cooperative Study Assistant for Large Language Models	May 23, 2023	Imitation LearningLanguage Modeling	CodeCode Available	0
R2H: Building Multimodal Navigation Helpers that Respond to Help Requests	May 23, 2023	BenchmarkingLanguage Modeling	—Unverified	0
The Knowledge Alignment Problem: Bridging Human and External Knowledge for Large Language Models	May 23, 2023	HallucinationLanguage Modeling	CodeCode Available	0
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings	May 23, 2023	Language ModelingLanguage Modelling	—Unverified	0
Towards A Unified View of Sparse Feed-Forward Network in Pretraining Large Language Model	May 23, 2023	AvgLanguage Modeling	—Unverified	0
Prompt-Based Monte-Carlo Tree Search for Goal-Oriented Dialogue Policy Planning	May 23, 2023	Language ModelingLanguage Modelling	CodeCode Available	1
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models	May 23, 2023	Common Sense ReasoningImage Generation	CodeCode Available	2

Show:10 25 50

← PrevPage 196 of 353Next →

All datasets WikiText-103 Penn Treebank (Word Level)enwik8 The Pile WikiText-2 LAMBADA One Billion Word Text8 Penn Treebank (Character Level)Hutter Prize OpenWebText SALMon

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Decay RNN	Validation perplexity	76.67	—	Unverified
2	GRU	Validation perplexity	53.78	—	Unverified
3	LSTM	Validation perplexity	52.73	—	Unverified
4	LSTM	Test perplexity	48.7	—	Unverified
5	Temporal CNN	Test perplexity	45.2	—	Unverified
6	TCN	Test perplexity	45.19	—	Unverified
7	GCNN-8	Test perplexity	44.9	—	Unverified
8	Neural cache model (size = 100)	Test perplexity	44.8	—	Unverified
9	Neural cache model (size = 2,000)	Test perplexity	40.8	—	Unverified
10	GPT-2 Small	Test perplexity	37.5	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TCN	Test perplexity	108.47	—	Unverified
2	Seq-U-Net	Test perplexity	107.95	—	Unverified
3	GRU (Bai et al., 2018)	Test perplexity	92.48	—	Unverified
4	R-Transformer	Test perplexity	84.38	—	Unverified
5	Zaremba et al. (2014) - LSTM (medium)	Test perplexity	82.7	—	Unverified
6	Gal & Ghahramani (2016) - Variational LSTM (medium)	Test perplexity	79.7	—	Unverified
7	LSTM (Bai et al., 2018)	Test perplexity	78.93	—	Unverified
8	Zaremba et al. (2014) - LSTM (large)	Test perplexity	78.4	—	Unverified
9	Gal & Ghahramani (2016) - Variational LSTM (large)	Test perplexity	75.2	—	Unverified
10	Inan et al. (2016) - Variational RHN	Test perplexity	66	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LSTM (7 layers)	Bit per Character (BPC)	1.67	—	Unverified
2	Hypernetworks	Bit per Character (BPC)	1.34	—	Unverified
3	SHA-LSTM (4 layers, h=1024, no attention head)	Bit per Character (BPC)	1.33	—	Unverified
4	LN HM-LSTM	Bit per Character (BPC)	1.32	—	Unverified
5	ByteNet	Bit per Character (BPC)	1.31	—	Unverified
6	Recurrent Highway Networks	Bit per Character (BPC)	1.27	—	Unverified
7	Large FS-LSTM-4	Bit per Character (BPC)	1.25	—	Unverified
8	Large mLSTM	Bit per Character (BPC)	1.24	—	Unverified
9	AWD-LSTM (3 layers)	Bit per Character (BPC)	1.23	—	Unverified
10	Cluster-Former (#C=512)	Bit per Character (BPC)	1.22	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Smaller Transformer 126M (pre-trained)	Test perplexity	33	—	Unverified
2	OPT 125M	Test perplexity	32.26	—	Unverified
3	Larger Transformer 771M (pre-trained)	Test perplexity	28.1	—	Unverified
4	OPT 1.3B	Test perplexity	19.55	—	Unverified
5	GPT-Neo 125M	Test perplexity	17.83	—	Unverified
6	OPT 2.7B	Test perplexity	17.81	—	Unverified
7	Smaller Transformer 126M (fine-tuned)	Test perplexity	12	—	Unverified
8	GPT-Neo 1.3B	Test perplexity	11.46	—	Unverified
9	Transformer 125M	Test perplexity	10.7	—	Unverified
10	GPT-Neo 2.7B	Test perplexity	10.44	—	Unverified