SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 1475114800 of 17610 papers

TitleStatusHype
SINA-BERT: A Pre-Trained Language Model for Analysis of Medical Texts in Persian0
Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard0
Single-Read Reconstruction for DNA Data Storage Using Transformers0
Single-Shot Black-Box Adversarial Attacks Against Malware Detectors: A Causal Language Model Approach0
SiRA: Sparse Mixture of Low Rank Adaptation0
Relational recurrent neural networksCode0
LEP-AD: Language Embedding of Proteins and Attention to Drugs predicts drug target interactionsCode0
Low Rank Factorizations are Indirect Encodings for Deep NeuroevolutionCode0
Low-Rank Constraints for Fast Inference in Structured ModelsCode0
Language Modeling for Code-Switching: Evaluation, Integration of Monolingual Data, and Discriminative TrainingCode0
Neural Authorship Attribution: Stylometric Analysis on Large Language ModelsCode0
Neural Architecture Search with Reinforcement LearningCode0
SPARSEFIT: Few-shot Prompting with Sparse Fine-tuning for Jointly Generating Predictions and Natural Language ExplanationsCode0
Reinforced Large Language Model is a formal theorem proverCode0
Neural Architecture OptimizationCode0
Layered Unlearning for Adversarial RelearningCode0
Relevance in Dialogue: Is Less More? An Empirical Comparison of Existing Metrics, and a Novel Simple MetricCode0
Neural Academic Paper GenerationCode0
Network Traffic Anomaly Detection Using Recurrent Neural NetworksCode0
Understand User Opinions of Large Language Models via LLM-Powered In-the-Moment User Experience InterviewsCode0
Reliable Academic Conference Question Answering: A Study Based on Large Language ModelCode0
Lower Perplexity is Not Always Human-LikeCode0
Language Model Guided Interpretable Video Action ReasoningCode0
KRLS: Improving End-to-End Response Generation in Task Oriented Dialog with Reinforced Keywords LearningCode0
Network-informed Prompt Engineering against Organized Astroturf Campaigns under Extreme Class ImbalanceCode0
Bias Amplification in Language Model Evolution: An Iterated Learning PerspectiveCode0
Sparseout: Controlling Sparsity in Deep NetworksCode0
NeSy is alive and well: A LLM-driven symbolic approach for better code comment data generation and classificationCode0
Regulation of Language Models With Interpretability Will Likely Result In A Performance Trade-OffCode0
UIO at SemEval-2023 Task 12: Multilingual fine-tuning for sentiment classification in low-resource languagesCode0
Sparse Sinkhorn AttentionCode0
NESTLE: a No-Code Tool for Statistical Analysis of Legal CorpusCode0
Regularizing RNNs by Stabilizing ActivationsCode0
Regularizing Neural Networks by Penalizing Confident Output DistributionsCode0
Keep It Private: Unsupervised Privatization of Online TextCode0
Repairing Language Model Pipelines by Meta Self-Refining Competing Constraints at RuntimeCode0
Nested LSTMsCode0
Towards understanding evolution of science through language model seriesCode0
Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language ModelsCode0
Text Counterfactuals via Latent Optimization and Shapley-Guided SearchCode0
Regularized Adaptive Momentum Dual Averaging with an Efficient Inexact Subproblem Solver for Training Structured Neural NetworkCode0
Reframing linguistic bootstrapping as joint inference using visually-grounded grammar induction modelsCode0
LANGUAGE MODEL EMBEDDINGS IMPROVE SENTIMENT ANALYSIS IN RUSSIANCode0
Towards Understanding of Medical Randomized Controlled Trials by Conclusion GenerationCode0
SpatialLLM: From Multi-modality Data to Urban Spatial IntelligenceCode0
Language Model Classifier Aligns Better with Physician Word Sensitivity than XGBoost on Readmission PredictionCode0
Replacing Language Model for Style TransferCode0
KBioXLM: A Knowledge-anchored Biomedical Multilingual Pretrained Language ModelCode0
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference ContentCode0
Re-framing Incremental Deep Language Models for Dialogue Processing with Multi-task LearningCode0
Show:102550
← PrevPage 296 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified