SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 1000110050 of 17610 papers

TitleStatusHype
On the steerability of large language models toward data-driven personas0
On the Thinking-Language Modeling Gap in Large Language Models0
On the Transformer Growth for Progressive BERT Training0
On The Truthfulness of 'Surprisingly Likely' Responses of Large Language Models0
On the Use of Entity Embeddings from Pre-Trained Language Models for Knowledge Graph Completion0
On the use of linguistic similarities to improve Neural Machine Translation for African Languages0
OntoType: Ontology-Guided and Pre-Trained Language Model Assisted Fine-Grained Entity Typing0
A Study of Backdoors in Instruction Fine-tuned Language Models0
On Unified Prompt Tuning for Request Quality Assurance in Public Code Review0
On Unsupervised Training of Link Grammar Based Language Models0
OPAL: Ontology-Aware Pretrained Language Model for End-to-End Task-Oriented Dialogue0
Open ASR for Icelandic: Resources and a Baseline System0
Open-Category Human-Object Interaction Pre-Training via Language Modeling Framework0
Open-Domain Dialogue Generation Based on Pre-trained Language Models0
Open-Domain Name Error Detection using a Multi-Task RNN0
OpenECAD: An Efficient Visual Language Model for Editable 3D-CAD Design0
OpenHOI: Open-World Hand-Object Interaction Synthesis with Multimodal Large Language Model0
Open Information Extraction for SOV Language Based on Entity-Predicate Pair Detection0
A Survey on Open Information Extraction from Rule-based Model to Large Language Model0
Open Information Extraction from Conjunctive Sentences0
Towards Uncovering How Large Language Model Works: An Explainability Perspective0
OpenReviewer: A Specialized Large Language Model for Generating Critical Scientific Paper Reviews0
Open Set Video HOI detection from Action-Centric Chain-of-Look Prompting0
Open-Source Conversational AI with SpeechBrain 1.00
OpenSplat3D: Open-Vocabulary 3D Instance Segmentation using Gaussian Splatting0
OpenThaiGPT 1.5: A Thai-Centric Open Source Large Language Model0
Open-vocabulary object 6D pose estimation0
Open-Vocabulary Temporal Action Localization using Multimodal Guidance0
Open-World Evaluation for Retrieving Diverse Perspectives0
Open-World Object Manipulation using Pre-trained Vision-Language Models0
Operationalizing CaMeL: Strengthening LLM Defenses for Enterprise Deployment0
Ophtha-LLaMA2: A Large Language Model for Ophthalmology0
OPI at SemEval 2023 Task 9: A Simple But Effective Approach to Multilingual Tweet Intimacy Analysis0
Opportunities and Challenges of Generative-AI in Finance0
OPSD: an Offensive Persian Social media Dataset and its baseline evaluations0
Fine-Tuning Adaptive Stochastic Optimizers: Determining the Optimal Hyperparameter ε via Gradient Magnitude Histogram Analysis0
Mapping of attention mechanisms to a generalized Potts model0
Optimization of Retrieval-Augmented Generation Context with Outlier Detection0
Rail-only: A Low-Cost High-Performance Network for Training LLMs with Trillion Parameters0
Optimizing and Fine-tuning Large Language Model for Urban Renewal0
Optimizing Estonian TV Subtitles with Semi-supervised Learning and LLMs0
Optimizing Language Models for Human Preferences is a Causal Inference Problem0
Optimizing Language Models for Inference Time Objectives using Reinforcement Learning0
Optimizing Large Language Models for Turkish: New Methodologies in Corpus Selection and Training0
Optimizing Large Language Models to Expedite the Development of Smart Contracts0
Optimizing Large Language Models with an Enhanced LoRA Fine-Tuning Algorithm for Efficiency and Robustness in NLP Tasks0
Optimizing Large Language Model Training Using FP4 Quantization0
Optimizing Low-Resource Language Model Training: Comprehensive Analysis of Multi-Epoch, Multi-Lingual, and Two-Stage Approaches0
Optimizing open-domain question answering with graph-based retrieval augmented generation0
Optimizing Photonic Structures with Large Language Model Driven Algorithm Discovery0
Show:102550
← PrevPage 201 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified