SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 1755117600 of 17610 papers

TitleStatusHype
Combating Adversarial Attacks with Multi-Agent DebateCode0
COMAL: A Convergent Meta-Algorithm for Aligning LLMs with General PreferencesCode0
Colorless green recurrent networks dream hierarchicallyCode0
Fast Multipole Attention: A Divide-and-Conquer Attention Mechanism for Long SequencesCode0
CoLMbo: Speaker Language Model for Descriptive ProfilingCode0
Fast or Better? Balancing Accuracy and Cost in Retrieval-Augmented Generation with Flexible User ControlCode0
Authorship Attribution Using a Neural Network Language ModelCode0
ICU: Conquering Language Barriers in Vision-and-Language Modeling by Dividing the Tasks into Image Captioning and Language UnderstandingCode0
Collaborative Stance Detection via Small-Large Language Model Consistency VerificationCode0
Collaborative Development of NLP modelsCode0
Fast-Slow Recurrent Neural NetworksCode0
Pre-training of Graph Augmented Transformers for Medication RecommendationCode0
Fast, Small and Exact: Infinite-order Language Modelling with Compressed Suffix TreesCode0
FASTSUBS: An Efficient and Exact Procedure for Finding the Most Likely Lexical Substitutes Based on an N-gram Language ModelCode0
ColBERT Retrieval and Ensemble Response Scoring for Language Model Question AnsweringCode0
Analyzing constrained LLM through PDFA-learningCode0
Graph-based Uncertainty Metrics for Long-form Language Model OutputsCode0
Cognate Transformer for Automated Phonological Reconstruction and Cognate Reflex PredictionCode0
Fast Training of Recurrent Neural Networks with Stationary State FeedbacksCode0
Fast transcription of speech in low-resource languagesCode0
CogALex-VI Shared Task: Transrelation - A Robust Multilingual Language Model for Multilingual Relation IdentificationCode0
FastTrees: Parallel Latent Tree-Induction for Faster Sequence EncodingCode0
CoF: Coarse to Fine-Grained Image Understanding for Multi-modal Large Language ModelsCode0
Why gradient clipping accelerates training: A theoretical justification for adaptivityCode0
Coding Textual Inputs Boosts the Accuracy of Neural NetworksCode0
Improving Segmentation for Technical Support ProblemsCode0
IDEA: Enhancing the Rule Learning Ability of Large Language Model Agent through Induction, Deduction, and AbductionCode0
Code Soliloquies for Accurate Calculations in Large Language ModelsCode0
Enhancing Source Code Classification Effectiveness via Prompt Learning Incorporating Knowledge FeaturesCode0
Decomposed Prompting to Answer Questions on a Course Discussion BoardCode0
Investigating and Extending Homans' Social Exchange Theory with Large Language Model based AgentsCode0
Adaptation of domain-specific transformer models with text oversampling for sentiment analysis of social media posts on Covid-19 vaccinesCode0
A Hybrid GA LLM Framework for Structured Task OptimizationCode0
Graphemic Normalization of the Perso-Arabic ScriptCode0
AdaPlus: Integrating Nesterov Momentum and Precise Stepsize Adjustment on AdamW BasisCode0
GraphextQA: A Benchmark for Evaluating Graph-Enhanced Large Language ModelsCode0
Author Identification using Multi-headed Recurrent Neural NetworksCode0
Identifying and Adapting Transformer-Components Responsible for Gender Bias in an English Language ModelCode0
Identifying and Extracting Rare Disease Phenotypes with Large Language ModelsCode0
Graph-Induced Syntactic-Semantic Spaces in Transformer-Based Variational AutoEncodersCode0
Investigating and Scaling up Code-Switching for Multilingual Language Model Pre-TrainingCode0
A Unifying View On Task-oriented Dialogue AnnotationCode0
Summarization-Based Document IDs for Generative Retrieval with Language ModelsCode0
A Unified Taxonomy-Guided Instruction Tuning Framework for Entity Set Expansion and Taxonomy ExpansionCode0
CodeKGC: Code Language Model for Generative Knowledge Graph ConstructionCode0
InPars-Light: Cost-Effective Unsupervised Training of Efficient RankersCode0
A Hybrid Convolutional Variational Autoencoder for Text GenerationCode0
A Unified Strategy for Multilingual Grammatical Error Correction with Pre-trained Cross-Lingual Language ModelCode0
Semantic Text Analysis for Detection of Compromised Accounts on Social NetworksCode0
Identifying Conspiracy Theories News based on Event Relation GraphCode0
Show:102550
← PrevPage 352 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified