SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 56015650 of 17610 papers

TitleStatusHype
Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy0
Advancing RNN Transducer Technology for Speech Recognition0
Advancing Single and Multi-task Text Classification through Large Language Model Fine-tuning0
Advantage Alignment Algorithms0
Adverbs, Surprisingly0
Adversarial and Domain-Aware BERT for Cross-Domain Sentiment Analysis0
Adversarial Contrastive Pre-training for Protein Sequences0
Adversarial Examples for DNA Classification0
Robustness to Modification with Shared Words in Paraphrase Identification0
Adversarial Generation of Natural Language0
Adversarial Negotiation Dynamics in Generative Language Models0
Adversarial Representation Learning for Text-to-Image Matching0
Adversarial Soft Prompt Tuning for Cross-Domain Sentiment Analysis0
Adversarial Text Purification: A Large Language Model Approach for Defense0
Adversarial Training of End-to-end Speech Recognition Using a Criticizing Language Model0
Adversarial Training of Word2Vec for Basket Completion0
Adversarial Training with Contrastive Learning in NLP0
Adversarial Transfer Learning for Punctuation Restoration0
Adversarial Transformer Language Models for Contextual Commonsense Inference0
Adversities are all you need: Classification of self-reported breast cancer posts on Twitter using Adversarial Fine-tuning0
Is your benchmark truly adversarial? AdvScore: Evaluating Human-Grounded Adversarialness0
A Dynamic Programming Algorithm for Computing N-gram Posteriors from Lattices0
以語言模型評估學習者文句修改前後之流暢度(Using language model to assess the fluency of learners sentences edited by teachers)[In Chinese]0
Aerial Vision-and-Language Navigation via Semantic-Topo-Metric Representation Guided LLM Reasoning0
A Factorized Recurrent Neural Network based architecture for medium to large vocabulary Language Modelling0
A Fairness-Driven Method for Learning Human-Compatible Negotiation Strategies0
A Fast, Performant, Secure Distributed Training Framework For Large Language Model0
A federated large language model for long-term time series forecasting0
Affect-LM: A Neural Language Model for Customizable Affective Text Generation0
AffectON: Incorporating Affect Into Dialog Generation0
Affordance Perception by a Knowledge-Guided Vision-Language Model with Efficient Error Correction0
A Financial Service Chatbot based on Deep Bidirectional Transformers0
A Fine-Grained Analysis of BERTScore0
A Finite-State Approach to Phrase-Based Statistical Machine Translation0
A First South African Corpus of Multilingual Code-switched Soap Opera Speech0
A Flexible Approach to Automated RNN Architecture Generation0
α-Flow: A Unified Framework for Continuous-State Discrete Flow Matching Models0
A Foundational Multimodal Vision Language AI Assistant for Human Pathology0
A Fourth Wave of Open Data? Exploring the Spectrum of Scenarios for Open Data and Generative AI0
A Framework and Dataset for Abstract Art Generation via CalligraphyGAN0
A framework for anomaly detection using language modeling, and its applications to finance0
A Framework for Collaborating a Large Language Model Tool in Brainstorming for Triggering Creative Thoughts0
A Framework for Decoding Event-Related Potentials from Text0
A Taxonomy of Foundation Model based Systems through the Lens of Software Architecture0
A Framework for Evaluating LLMs Under Task Indeterminacy0
A Framework for Evaluating Vision-Language Model Safety: Building Trust in AI for Public Sector Applications0
A Framework for Real-time Safeguarding the Text Generation of Large Language Model0
The Responsible Development of Automated Student Feedback with Generative AI0
A Framework to Assess the Persuasion Risks Large Language Model Chatbots Pose to Democratic Societies0
AfriKI: Machine-in-the-Loop Afrikaans Poetry Generation0
Show:102550
← PrevPage 113 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified