SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 94519500 of 17610 papers

TitleStatusHype
AD-AutoGPT: An Autonomous GPT for Alzheimer's Disease Infodemiology0
Unlocking the Potential of User Feedback: Leveraging Large Language Model as User Simulator to Enhance Dialogue SystemCode0
Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized Language Model Finetuning Using Shared RandomnessCode1
Process Knowledge-infused Learning for Clinician-friendly Explanations0
Learning to Summarize and Answer Questions about a Virtual Robot's Past Actions0
Inspire creativity with ORIBA: Transform Artists' Original Characters into Chatbots through Large Language Model0
CMLM-CSE: Based on Conditional MLM Contrastive Learning for Sentence Embeddings0
Semantic HELM: A Human-Readable Memory for Reinforcement LearningCode1
Propagating Knowledge Updates to LMs Through DistillationCode1
ChessGPT: Bridging Policy Learning and Language ModelingCode1
Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models0
Block-State Transformers0
Distillation Strategies for Discriminative Speech Recognition Rescoring0
Diffusion Models for Open-Vocabulary Segmentation0
Can ChatGPT pass the Vietnamese National High School Graduation Examination?0
Personalized Image Enhancement Featuring Masked Style ModelingCode0
Language-Guided Music Recommendation for Video via Prompt Analogies0
Mapping Researcher Activity based on Publication Data by means of Transformers0
Pushing the Limits of Unsupervised Unit Discovery for SSL Speech RepresentationCode1
Neural models for Factual Inconsistency Classification with ExplanationsCode0
Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text IntegrationCode3
One Law, Many Languages: Benchmarking Multilingual Legal Reasoning for Judicial SupportCode0
Generate to Understand for RepresentationCode1
Revealing the structure of language model capabilitiesCode0
CLIPXPlore: Coupled CLIP and Shape Spaces for 3D Shape Exploration0
Recipes for Sequential Pre-training of Multilingual Encoder and Seq2Seq Models0
Toward Grounded Commonsense Reasoning0
Radiology-GPT: A Large Language Model for Radiology0
World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language ModelsCode1
AVIS: Autonomous Visual Information Seeking with Large Language Model Agent0
Large-scale Language Model Rescoring on Long-form Data0
I See Dead People: Gray-Box Adversarial Attack on Image-To-Text Models0
PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and Pause-based Prosody Modeling0
NoCoLA: The Norwegian Corpus of Linguistic AcceptabilityCode0
Tokenization with Factorized Subword EncodingCode1
WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human PreferencesCode3
XrayGPT: Chest Radiographs Summarization using Medical Vision-Language ModelsCode2
INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank AdaptationCode4
Augmenting Language Models with Long-Term Memory0
EriBERTa: A Bilingual Pre-Trained Language Model for Clinical Natural Language Processing0
Global and Local Semantic Completion Learning for Vision-Language Pre-trainingCode1
Gradient Ascent Post-training Enhances Language Model GeneralizationCode1
InstructP2P: Learning to Edit 3D Point Clouds with Text Instructions0
On the N-gram Approximation of Pre-trained Language Models0
Large language models and (non-)linguistic recursion0
Weakly supervised information extraction from inscrutable handwritten document images0
Valley: Video Assistant with Large Language model Enhanced abilitYCode2
Waffling around for Performance: Visual Classification with Random Words and Broad ConceptsCode1
Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation MethodCode1
GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language ModelCode1
Show:102550
← PrevPage 190 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified