SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 1515115200 of 17610 papers

TitleStatusHype
Promoting Open-domain Dialogue Generation through Learning Pattern Information between Contexts and ResponsesCode0
LLM-RankFusion: Mitigating Intrinsic Inconsistency in LLM-based RankingCode0
Runaway is Ashamed, But Helpful: On the Early-Exit Behavior of Large Language Model-based Agents in Embodied EnvironmentsCode0
Toward a Thermodynamics of MeaningCode0
RuOpinionNE-2024: Extraction of Opinion Tuples from Russian News TextsCode0
Russian Language Datasets in the Digitial Humanities Domain and Their Evaluation with Word EmbeddingsCode0
Russian Natural Language Generation: Creation of a Language Modelling Dataset and Evaluation with Modern Neural ArchitecturesCode0
Promoting Exploration in Memory-Augmented Adam using Critical MomentaCode0
The Distributional Hypothesis Does Not Fully Explain the Benefits of Masked Language Model PretrainingCode0
Prometheus Chatbot: Knowledge Graph Collaborative Large Language Model for Computer Components RecommendationCode0
ProMap: Effective Bilingual Lexicon Induction via Language Model PromptingCode0
Multimodal data matters: language model pre-training over structured and unstructured electronic health recordsCode0
Project SHADOW: Symbolic Higher-order Associative Deductive reasoning On Wikidata using LM probingCode0
LLM-QFL: Distilling Large Language Model for Quantum Federated LearningCode0
S2SNet: A Pretrained Neural Network for Superconductivity DiscoveryCode0
Language Generation via Combinatorial Constraint Satisfaction: A Tree Search Enhanced Monte-Carlo ApproachCode0
S3: A Simple Strong Sample-effective Multimodal Dialog SystemCode0
LASMP: Language Aided Subset Sampling Based Motion PlannerCode0
JCSE: Contrastive Learning of Japanese Sentence Embeddings and Its ApplicationsCode0
Just ClozE! A Novel Framework for Evaluating the Factual Consistency Faster in Abstractive SummarizationCode0
Projective Methods for Mitigating Gender Bias in Pre-trained Language ModelsCode0
The Effectiveness of Masked Language Modeling and Adapters for Factual Knowledge InjectionCode0
Public Attitudes Toward ChatGPT on Twitter: Sentiments, Topics, and OccupationsCode0
The Effect of Different Writing Tasks on Linguistic Style: A Case Study of the ROC Story Cloze TaskCode0
Story Ending Prediction by Transferable BERTCode0
More Room for Language: Investigating the Effect of Retrieval on Language ModelsCode0
StrassenNets: Deep Learning with a Multiplication BudgetCode0
Progressive Distillation Based on Masked Generation Feature Method for Knowledge Graph CompletionCode0
Progressive Class Semantic Matching for Semi-supervised Text ClassificationCode0
Profiling Patient Transcript Using Large Language Model Reasoning Augmentation for Alzheimer's Disease DetectionCode0
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model SafetyCode0
The effect of fine-tuning on language model toxicityCode0
Professor Forcing: A New Algorithm for Training Recurrent NetworksCode0
LLMPC: Large Language Model Predictive ControlCode0
Learning to Maximize Mutual Information for Chain-of-Thought DistillationCode0
More RLHF, More Trust? On The Impact of Preference Alignment On TrustworthinessCode0
Product Information Extraction using ChatGPTCode0
Procedural Dilemma Generation for Evaluating Moral Reasoning in Humans and Language ModelsCode0
Learning to Locate Visual Answer in Video Corpus Using QuestionCode0
Streaming Joint Speech Recognition and Disfluency DetectionCode0
The Effects of In-domain Corpus Size on pre-training BERTCode0
MoRE-LLM: Mixture of Rule Experts Guided by a Large Language ModelCode0
SALM: A Multi-Agent Framework for Language Model-Driven Social Network SimulationCode0
Problem-Solving in Language Model NetworksCode0
Probing the Robustness Properties of Neural Speech CodecsCode0
LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot SystemsCode0
More Expressive Attention with Negative WeightsCode0
The emergence of number and syntax units in LSTM language modelsCode0
Probing the Capacity of Language Model Agents to Operationalize Disparate Experiential Context Despite DistractionCode0
Probing Simile Knowledge from Pre-trained Language ModelsCode0
Show:102550
← PrevPage 304 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified