SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 59015950 of 17610 papers

TitleStatusHype
Analyzing Linguistic Knowledge in Sequential Model of Sentence0
Analyzing Optimization for Statistical Machine Translation: MERT Learns Verbosity, PRO Learns Length0
Analyzing Pokémon and Mario Streamers' Twitch Chat with LLM-based User Embeddings0
Analyzing Information Leakage of Updates to Natural Language Models0
Analyzing Sentiment Polarity Reduction in News Presentation through Contextual Perturbation and Large Language Models0
Analyzing Similarity Metrics for Data Selection for Language Model Pretraining0
Analyzing the Efficacy of an LLM-Only Approach for Image-based Document Question Answering0
Analyzing the Implicit Position Encoding Ability of Transformer Decoder0
Analyzing the Impact of Fake News on the Anticipated Outcome of the 2024 Election Ahead of Time0
Analyzing the Performance of ChatGPT in Cardiology and Vascular Pathologies0
Analyzing the Roles of Language and Vision in Learning from Limited Data0
Analyzing the Structure of Attention in a Transformer Language Model0
Analyzing Zero-shot Cross-lingual Transfer in Supervised NLP Tasks0
An analysis of full-size Russian complexly NER labelled corpus of Internet user reviews on the drugs based on deep learning and language neural nets0
An analysis of incorporating an external language model into a sequence-to-sequence model0
An Analysis of the Ability of Statistical Language Models to Capture the Structural Properties of Language0
An Anchor Learning Approach for Citation Field Learning0
An Annotated Dataset and Automatic Approaches for Discourse Mode Identification in Low-resource Bengali Language0
Anaphora Models and Reordering for Phrase-Based SMT0
An Application of Pseudo-Log-Likelihoods to Natural Language Scoring0
An Appraisal-Based Chain-Of-Emotion Architecture for Affective Language Model Game Agents0
An Approach to Build Zero-Shot Slot-Filling System for Industry-Grade Conversational Assistants0
An Approach to Improve Robustness of NLP Systems against ASR Errors0
Neuradicon: operational representation learning of neuroimaging reports0
An Assessment of the Impact of OCR Noise on Language Models0
A natural language processing-based approach: mapping human perception by understanding deep semantic features in street view images0
An Automated Reinforcement Learning Reward Design Framework with Large Language Model for Cooperative Platoon Coordination0
An Automatic SOAP Classification System Using Weakly Supervision And Transfer Learning0
An Autonomous Large Language Model Agent for Chemical Literature Data Mining0
Anchor-based Robust Finetuning of Vision-Language Models0
Anchored Diffusion Language Model0
Anchor function: a type of benchmark functions for studying language models0
Anchor & Transform: Learning Sparse Representations of Discrete Objects0
Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies0
AND does not mean OR: Using Formal Languages to Study Language Models' Representations0
An Early FIRST Reproduction and Improvements to Single-Token Decoding for Fast Listwise Reranking0
An Effective Contextual Language Modeling Framework for Speech Summarization with Augmented Features0
An Effective Data Creation Pipeline to Generate High-quality Financial Instruction Data for Large Language Model0
An Effective GCN-based Hierarchical Multi-label classification for Protein Function Prediction0
An Efficient Approach for Machine Translation on Low-resource Languages: A Case Study in Vietnamese-Chinese0
An efficient approach to represent enterprise web application structure using Large Language Model in the service of Intelligent Quality Engineering0
An Efficient Attention Mechanism for Sequential Recommendation Tasks: HydraRec0
An efficient language independent toolkit for complete morphological disambiguation0
An Efficient Language Model Using Double-Array Structures0
An empathic GPT-based chatbot to talk about mental disorders with Spanish teenagers0
An Empirical Comparison Between N-gram and Syntactic Language Models for Word Ordering0
An Empirical Comparison of LM-based Question and Answer Generation Methods0
An Empirical Evaluation of Noise Contrastive Estimation for the Neural Network Joint Model of Translation0
An Empirical Exploration in Quality Filtering of Text Data0
An Empirical Exploration of Local Ordering Pre-training for Structured Prediction0
Show:102550
← PrevPage 119 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified