SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 97019750 of 17610 papers

TitleStatusHype
What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages0
Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective0
Zero-Shot Dense Video Captioning by Jointly Optimizing Text and Moment0
Zero-shot Entity and Tweet Characterization with Designed Conditional Prompts and Contexts0
Word-Level Representation From Bytes For Language Modeling0
Zero-shot Generalization in Dialog State Tracking through Generative Question Answering0
Zero-shot Generation of Coherent Storybook from Plain Text Story using Diffusion Models0
Zero-shot Hazard Identification in Autonomous Driving: A Case Study on the COOOL Benchmark0
Zero-Shot Hierarchical Classification on the Common Procurement Vocabulary Taxonomy0
Unlocking Video-LLM via Agent-of-Thoughts Distillation0
xGQA: Cross-Lingual Visual Question Answering0
Knowledge Injection into Dialogue Generation via Language Models0
What represents ``style'' in authorship attribution?0
Zero-shot information extraction from radiological reports using ChatGPT0
Zero-Shot Learning Based Approach For Medieval Word Recognition Using Deep-Learned Features0
Zero-Shot Learning of Language Models for Describing Human Actions Based on Semantic Compositionality of Actions0
Zero-Shot Learning Over Large Output Spaces : Utilizing Indirect Knowledge Extraction from Large Language Models0
Zero-Shot Listwise Document Reranking with a Large Language Model0
Zero-shot Load Forecasting for Integrated Energy Systems: A Large Language Model-based Framework with Multi-task Learning0
Unsupervised Learning on an Approximate Corpus0
Zero-shot Object Navigation with Vision-Language Models Reasoning0
Training-Free Action Recognition and Goal Inference with Dynamic Frame Selection0
What Should Baby Models Read? Exploring Sample-Efficient Data Composition on Model Performance0
What's in a Measurement? Using GPT-3 on SemEval 2021 Task 8 -- MeasEval0
Zero-Shot Question Answering over Financial Documents using Large Language Models0
What's in a Name? Beyond Class Indices for Image Recognition0
Zero-shot Safety Prediction for Autonomous Robots with Foundation World Models0
Zero-Shot Semantic Segmentation via Spatial and Multi-Scale Aware Visual Class Embedding0
Word Midas Powered by StringNet: Discovering Lexicogrammatical Constructions in Situ0
Zero-Shot Sharpness-Aware Quantization for Pre-trained Language Models0
Zero-shot Text Classification With Generative Language Models0
Unsupervised Melody Segmentation Based on a Nested Pitman-Yor Language Model0
What's in your Head? Emergent Behaviour in Multi-Task Transformer Models0
What’s in Your Head? Emergent Behaviour in Multi-Task Transformer Models0
Adapting Long Context NLM for ASR Rescoring in Conversational Agents0
Zero-Shot Video Question Answering with Procedural Programs0
Zero-Shot Dense Retrieval with Embeddings from Relevance Feedback0
Unsupervised Method for Improving Arabic Speech Recognition Systems0
Unsupervised Morphological Tree Tokenizer0
Zeroth-Order Optimization Finds Flat Minima0
ZEROTOP: Zero-Shot Task-Oriented Semantic Parsing using Large Language Models0
ZETA: Leveraging Z-order Curves for Efficient Top-k Attention0
Unsupervised Morphology-Based Vocabulary Expansion0
Unsupervised morph segmentation and statistical language models for vocabulary expansion0
ZeroShotDataAug: Generating and Augmenting Training Data with ChatGPT0
ZiGong 1.0: A Large Language Model for Financial Credit0
Zipporah: a Fast and Scalable Data Cleaning System for Noisy Web-Crawled Parallel Corpora0
Ziya-Visual: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning0
XGPT: Cross-modal Generative Pre-Training for Image Captioning0
Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining0
Show:102550
← PrevPage 195 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified