SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 1055110600 of 17610 papers

TitleStatusHype
Prompt Tuning for Zero-shot Compositional Learning0
Prompt Tuning Pushes Farther, Contrastive Learning Pulls Closer: A Two-Stage Approach to Mitigate Social Biases0
PROMT DeepHybrid system for WMT12 shared translation task0
PronouncUR: An Urdu Pronunciation Lexicon Generator0
Pronoun Language Model and Grammatical Heuristics for Aiding Pronoun Prediction0
Pronoun Prediction with Latent Anaphora Resolution0
Pronoun Prediction with Linguistic Features and Example Weighing0
Proof Flow: Preliminary Study on Generative Flow Network Language Model Tuning for Formal Reasoning0
ProPath: Disease-Specific Protein Language Model for Variant Pathogenicity0
ProSG: Using Prompt Synthetic Gradients to Alleviate Prompt Forgetting of RNN-like Language Models0
ProSLM : A Prolog Synergized Language Model for explainable Domain Specific Knowledge Based Question Answering0
Prosody Learning Mechanism for Speech Synthesis System Without Text Length Limit0
ProsperAMnet at FinCausal 2020, Task 1 & 2: Modeling causality in financial texts using multi-headed transformers0
ProSwitch: Knowledge-Guided Instruction Tuning to Switch Between Professional and Non-Professional Responses0
Protein binding affinity prediction under multiple substitutions applying eGNNs on Residue and Atomic graphs combined with Language model information: eGRAL0
DisorderUnetLM: Validating ProteinUnet for efficient protein intrinsic disorder prediction0
Protein Language Model-Powered 3D Ligand Binding Site Prediction from Protein Sequence0
ProtFIM: Fill-in-Middle Protein Sequence Design via Protein Language Models0
ProtFlow: Fast Protein Sequence Design via Flow Matching on Compressed Protein Language Model Embeddings0
ProtIR: Iterative Refinement between Retrievers and Predictors for Protein Function Annotation0
ProtoDA: Efficient Transfer Learning for Few-Shot Intent Classification0
Protum: A New Method For Prompt Tuning Based on "[MASK]"0
Provably Confidential Language Modelling0
Provably Robust Watermarks for Open-Source Language Models0
Proverbs Run in Pairs: Evaluating Proverb Translation Capability of Large Language Model0
Prove Your Point!: Bringing Proof-Enhancement Principles to Argumentative Essay Generation0
Providing Insights for Open-Response Surveys via End-to-End Context-Aware Clustering0
Proxy-RLHF: Decoupling Generation and Alignment in Large Language Model with Proxy0
PRP: Propagating Universal Perturbations to Attack Large Language Model Guard-Rails0
PRS-Med: Position Reasoning Segmentation with Vision-Language Model in Medical Imaging0
Multi-modal Large Language Model Enhanced Pseudo 3D Perception Framework for Visual Commonsense Reasoning0
Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis0
Pseudointelligence: A Unifying Framework for Language Model Evaluation0
Pseudo-Label Guided Unsupervised Domain Adaptation of Contextual Embeddings0
Pseudo-Masked Language Models for Unified Language Model Pre-Training0
Pseudo-perplexity in One Fell Swoop for Protein Fitness Estimation0
PSG: Prompt-based Sequence Generation for Acronym Extraction0
PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency Spoken Dialogue Systems0
Psych-E: Configurable Response Generation using Personality Traits and Pragmatics0
Psychological Health Knowledge-Enhanced LLM-based Social Network Crisis Intervention Text Transfer Recognition Method0
PTab: Using the Pre-trained Language Model for Modeling Tabular Data0
PTR: A Pre-trained Language Model for Trajectory Recovery0
P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks0
Pulling Out the Stops: Rethinking Stopword Removal for Topic Models0
PUMGPT: A Large Vision-Language Model for Product Understanding0
Punctuation Prediction with Transition-based Parsing0
Punctuation restoration in Swedish through fine-tuned KB-BERT0
Purely sequence-trained neural networks for ASR based on lattice-free MMI0
Purifying Large Language Models by Ensembling a Small Language Model0
Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models0
Show:102550
← PrevPage 212 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified