SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 85518600 of 17610 papers

TitleStatusHype
J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue Language Modeling0
J-EDI QA: Benchmark for deep-sea organism-specific multimodal LLM0
Jeff Da at COIN - Shared Task: BIG MOOD: Relating Transformers to Explicit Commonsense Knowledge0
JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition0
Jellyfish: A Large Language Model for Data Preprocessing0
JELLY: Joint Emotion Recognition and Context Reasoning with LLMs for Conversational Speech Synthesis0
JEPA4Rec: Learning Effective Language Representations for Sequential Recommendation via Joint Embedding Predictive Architecture0
Jet Expansions of Residual Computation0
JHU System Description for the MADAR Arabic Dialect Identification Shared Task0
JIANG: Chinese Open Foundation Language Model0
JingFang: A Traditional Chinese Medicine Large Language Model of Expert-Level Medical Diagnosis and Syndrome Differentiation-Based Treatment0
Jira: a Kurdish Speech Recognition System Designing and Building Speech Corpus and Pronunciation Lexicon0
JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving0
基于预训练语言模型的繁体古文自动句读研究(Automatic Traditional Ancient Chinese Texts Segmentation and Punctuation Based on Pre-training Language Model)0
Joint Action Language Modelling for Transparent Policy Execution0
Joint Audio-Text Model for Expressive Speech-Driven 3D Facial Animation0
Joint Contextual Modeling for ASR Correction and Language Understanding0
Joint CTC/attention decoding for end-to-end speech recognition0
Joint Decoding of Tree Transduction Models for Sentence Compression0
Joint Encoder-Decoder Self-Supervised Pre-training for ASR0
Joint Estimation and Prediction of City-wide Delivery Demand: A Large Language Model Empowered Graph-based Learning Approach0
Joint Extraction of Entity and Relation with Information Redundancy Elimination0
Joint Language and Translation Modeling with Recurrent Neural Networks0
Joint Learning of Phonetic Units and Word Pronunciations for ASR0
Jointly Learning Author and Annotated Character N-gram Embeddings: A Case Study in Literary Text0
Jointly Learning to Embed and Predict with Multiple Languages0
Jointly Learning Word Representations and Composition Functions Using Predicate-Argument Structures0
Jointly Masked Sequence-to-Sequence Model for Non-Autoregressive Neural Machine Translation0
Jointly Reinforced User Simulator and Task-oriented Dialog System with Simplified Generative Architecture0
Jointly Trained Transformers models for Spoken Language Translation0
Joint Online Spoken Language Understanding and Language Modeling with Recurrent Neural Networks0
Joint Part-of-Speech and Language ID Tagging for Code-Switched Data0
Joint Recognition of Handwritten Text and Named Entities with a Neural End-to-end Model0
Joint Semantic and Structural Representation Learning for Enhancing User Preference Modelling0
Joint Semantic Knowledge Distillation and Masked Acoustic Modeling for Full-band Speech Restoration with Improved Intelligibility0
Joint Space Neural Probabilistic Language Model for Statistical Machine Translation0
Joint unsupervised and supervised learning for context-aware language identification0
Joint Unsupervised and Supervised Training for Multilingual ASR0
Joint Verification and Refinement of Language Models for Safety-Constrained Planning0
Joint WMT 2013 Submission of the QUAERO Project0
JPPO: Joint Power and Prompt Optimization for Accelerated Large Language Model Services0
Judgment of Thoughts: Courtroom of the Binary Logical Reasoning in Large Language Models0
JU NITM at IJCNLP-2017 Task 5: A Classification Approach for Answer Selection in Multi-choice Question Answering System0
Jurassic is (almost) All You Need: Few-Shot Meaning-to-Text Generation for Open-Domain Dialogue0
Juru: Legal Brazilian Large Language Model from Reputable Sources0
Just Add Functions: A Neural-Symbolic Language Model0
JUST at SemEval-2020 Task 11: Detecting Propaganda Techniques Using BERT Pre-trained Model0
Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects0
JU-USAAR: A Domain Adaptive MT System0
KACE: Generating Knowledge Aware Contrastive Explanations for Natural Language Inference0
Show:102550
← PrevPage 172 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified