SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 67016750 of 17610 papers

TitleStatusHype
Beyond Masked and Unmasked: Discrete Diffusion Models via Partial Masking0
Beyond Metrics: A Critical Analysis of the Variability in Large Language Model Evaluation Frameworks0
Beyond Multiple Choice: Evaluating Steering Vectors for Adaptive Free-Form Summarization0
Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey0
Beyond Relevant Documents: A Knowledge-Intensive Approach for Query-Focused Summarization using Large Language Models0
Not All Options Are Created Equal: Textual Option Weighting for Token-Efficient LLM-Based Knowledge Tracing0
Beyond Segmentation: Road Network Generation with Multi-Modal LLMs0
Beyond Self-Consistency: Ensemble Reasoning Boosts Consistency and Accuracy of LLMs in Cancer Staging0
Beyond Text Compression: Evaluating Tokenizers Across Scales0
Beyond Text: Implementing Multimodal Large Language Model-Powered Multi-Agent Systems Using a No-Code Platform0
Beyond Text: Utilizing Vocal Cues to Improve Decision Making in LLMs for Robot Navigation Tasks0
Beyond Turing: A Comparative Analysis of Approaches for Detecting Machine-Generated Text0
Beyond Visual Understanding: Introducing PARROT-360V for Vision Language Model Benchmarking0
Beyond Word-based Language Model in Statistical Machine Translation0
Beyond Words: On Large Language Models Actionability in Mission-Critical Risk Analysis0
BFClass: A Backdoor-free Text Classification Framework0
BGE Landmark Embedding: A Chunking-Free Embedding Method For Retrieval Augmented Long-Context Large Language Models0
Bias A-head? Analyzing Bias in Transformer-Based Language Model Attention Heads0
Biases in Predicting the Human Language Model0
Bias Neutralization Framework: Measuring Fairness in Large Language Models with Bias Intelligence Quotient (BiQ)0
BiasScanner: Automatic Detection and Classification of News Bias to Strengthen Democracy0
Bias Vector: Mitigating Biases in Language Models with Task Arithmetic Approach0
Biber Redux: Reconsidering Dimensions of Variation in American English0
Bi-directional Context-Enhanced Speech Large Language Models for Multilingual Conversational ASR0
Bidirectional Generative Adversarial Networks for Neural Machine Translation0
Bidirectional Language Models Are Also Few-shot Learners0
Bidirectional Long-Short Term Memory for Video Description0
Bidirectional Representations for Low Resource Spoken Language Understanding0
Bidirectional Trained Tree-Structured Decoder for Handwritten Mathematical Expression Recognition0
Bielik 11B v2 Technical Report0
Bielik 7B v0.1: A Polish Language Model -- Development, Insights, and Evaluation0
Bielik v3 Small: Technical Report0
BiFold: Bimanual Cloth Folding with Language Guidance0
Bifurcated Attention: Accelerating Massively Parallel Decoding with Shared Prefixes in LLMs0
Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners0
BIG MOOD: Relating Transformers to Explicit Commonsense Knowledge0
Bi-Granularity Contrastive Learning for Post-Training in Few-Shot Scene0
CodeSSM: Towards State Space Models for Code Understanding0
Bilexical Embeddings for Quality Estimation0
Bilingual Adaptation of Monolingual Foundation Models0
Bilingual Dictionary-based Language Model Pretraining for Neural Machine Translation0
Bilingual Language Modeling, A transfer learning technique for Roman Urdu0
Bilingually-constrained Phrase Embeddings for Machine Translation0
Bilingual Structured Language Models for Statistical Machine Translation0
Bilingual Word Embeddings from Non-Parallel Document-Aligned Data Applied to Bilingual Lexicon Induction0
Billions of Parameters Are Worth More Than In-domain Training Data: A case study in the Legal Case Entailment Task0
BiLMa: Bidirectional Local-Matching for Text-based Person Re-identification0
Bi-Mamba: Towards Accurate 1-Bit State Space Models0
BIM: Block-Wise Self-Supervised Learning with Masked Image Modeling0
Binarized LSTM Language Model0
Show:102550
← PrevPage 135 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified