SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 62016250 of 17610 papers

TitleStatusHype
A Semi-universal Pipelined Approach to the CoNLL 2017 UD Shared Task0
A Sequence-to-Sequence Approach for Arabic Pronoun Resolution0
ASER: Activation Smoothing and Error Reconstruction for Large Language Model Quantization0
ASGen: Answer-containing Sentence Generation to Pre-Train Question Generator for Scale-up Data in Question Answering0
ASGM-KG: Unveiling Alluvial Gold Mining Through Knowledge Graphs0
ASGO: Adaptive Structured Gradient Optimization0
A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation0
A Side-by-side Comparison of Transformers for English Implicit Discourse Relation Classification0
A Simple and Effective Method for Injecting Word-Level Information into Character-Aware Neural Language Models0
A Simple and Efficient Method To Generate Word Sense Representations0
A Simple and Efficient Probabilistic Language model for Code-Mixed Text0
A Simple Architecture for Enterprise Large Language Model Applications based on Role based security and Clearance Levels using Retrieval-Augmented Generation or Mixture of Experts0
N-Shot Learning for Augmenting Task-Oriented Dialogue State Tracking0
A Simple but Effective Method to Incorporate Multi-turn Context with BERT for Conversational Machine Comprehension0
A Simple Cache Model for Image Recognition0
Accelerating Multilingual Language Model for Excessively Tokenized Languages0
A Simple Fully Connected Network for Composing Word Embeddings from Characters0
A Simple Language Model based on PMI Matrix Approximations0
A Simple Model for Distantly Supervised Relation Extraction0
A Simple, Yet Effective Approach to Finding Biases in Code Generation0
A Simple yet Efficient Ensemble Approach for AI-generated Text Detection0
Multiperiodic Processes: Ergodic Sources with a Sublinear Entropy0
Ask Language Model to Clean Your Noisy Translation Data0
"Ask Me Anything": How Comcast Uses LLMs to Assist Agents in Real Time0
Just Ask One More Time! Self-Agreement Improves Reasoning of Language Models in (Almost) All Scenarios0
Ask Optimal Questions: Aligning Large Language Models with Retriever's Preference in Conversational Search0
A Small Claims Court for the NLP: Judging Legal Text Classification Strategies With Small Datasets0
ASOBEK at SemEval-2016 Task 1: Sentence Representation with Character N-gram Embeddings for Semantic Textual Similarity0
A Span Extraction Approach for Information Extraction on Visually-Rich Documents0
Aspect-based Academic Search using Domain-specific KB0
Aspect-Based Sentiment Analysis using BERT0
Aspect-Based Sentiment Analysis using Local Context Focus Mechanism with DeBERTa0
Aspect Oriented Suggestion Extraction from Online Reviews0
A Speed Odyssey for Deployable Quantization of LLMs0
A spelling correction model for end-to-end speech recognition0
AspirinSum: an Aspect-based utility-preserved de-identification Summarization framework0
A Split-and-Privatize Framework for Large Language Model Fine-Tuning0
ASR4REAL: An extended benchmark for speech models0
ASR Adaptation for E-commerce Chatbots using Cross-Utterance Context and Multi-Task Language Modeling0
ASRank: Zero-Shot Re-Ranking with Answer Scent for Document Retrieval0
ASR for Documenting Acutely Under-Resourced Indigenous Languages0
ASR-Generated Text for Language Model Pre-training Applied to Speech Tasks0
ASR Rescoring and Confidence Estimation with ELECTRA0
Assamese-English Bilingual Machine Translation0
Assessing and Enhancing the Robustness of LLM-based Multi-Agent Systems Through Chaos Engineering0
Assessing and Understanding Creativity in Large Language Models0
Assessing Discourse Relations in Language Generation from GPT-20
Assessing Generalization for Subpopulation Representative Modeling via In-Context Learning0
Assessing GPT4-V on Structured Reasoning Tasks0
Assessing Out-of-Domain Language Model Performance from Few Examples0
Show:102550
← PrevPage 125 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified