SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 72017250 of 17610 papers

TitleStatusHype
Flexible Language Modeling in Continuous Space with Transformer-based Autoregressive Flows0
Flexible Model Interpretability through Natural Language Model Editing0
Flexibly Scaling Large Language Models Contexts Through Extensible Tokenization0
Flextron: Many-in-One Flexible Large Language Model0
FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGAs0
FLORA: Formal Language Model Enables Robust Training-free Zero-shot Object Referring Analysis0
Flow-based generative models as iterative algorithms in probability space0
FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing0
FlowKV: A Disaggregated Inference Framework with Low-Latency KV Cache Transfer and Load-Aware Scheduling0
FlowPrior: Learning Expressive Priors for Latent Variable Sentence Models0
FltLM: An Intergrated Long-Context Large Language Model for Effective Context Filtering and Understanding0
FLTrojan: Privacy Leakage Attacks against Federated Language Models Through Selective Weight Tampering0
FLuRKA: Fast and accurate unified Low-Rank & Kernel Attention0
FMMformer: Efficient and Flexible Transformer via Decomposed Near-field and Far-field Attention0
FocusChat: Text-guided Long Video Understanding via Spatiotemporal Information Filtering0
Focused Contrastive Training for Test-based Constituency Analysis0
FOCUS: Forging Originality through Contrastive Use in Self-Plagiarism for Language Models0
Focusing Annotation for Semantic Role Labeling0
FoldGPT: Simple and Effective Large Language Model Compression Scheme0
FoleyGen: Visually-Guided Audio Generation0
FoMo Rewards: Can we cast foundation models as reward functions?0
FoodGPT: A Large Language Model in Food Testing Domain with Incremental Pre-training and Knowledge Graph Prompt0
FoodPuzzle: Developing Large Language Model Agents as Flavor Scientists0
FootGPT : A Large Language Model Development Experiment on a Minimal Setting0
Foot In The Door: Understanding Large Language Model Jailbreaking via Cognitive Psychology0
uto\!L: Autonomous Evaluation of LLMs for Truth Maintenance and Reasoning Tasks0
ForceGen: End-to-end de novo protein generation based on nonlinear mechanical unfolding responses using a protein language diffusion model0
Forecasting Deep Learning Dynamics with Applications to Hyperparameter Tuning0
Forecasting from Clinical Textual Time Series: Adaptations of the Encoder and Decoder Language Model Families0
Forecasting Frontier Language Model Agent Capabilities0
Forecasting Live Chat Intent from Browsing History0
Forecasting People's Needs in Hurricane Events from Social Network0
Forecasting Rare Language Model Behaviors0
ForecastQA: A Question Answering Challenge for Event Forecasting with Temporal Text Data0
Forest-to-String SMT for Asian Language Translation: NAIST at WAT 20140
ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization0
Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage0
Forging Time Series with Language: A Large Language Model Approach to Synthetic Data Generation0
Formal Analysis of Art: Proxy Learning of Visual Concepts from Style Through Language Models0
Formal Aspects of Language Modeling0
Form follows Function: Text-to-Text Conditional Graph Generation based on Functional Requirements0
FormLM: Recommending Creation Ideas for Online Forms by Modelling Semantic and Structural Information0
FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction0
Foundation Models and Adaptive Feature Selection: A Synergistic Approach to Video Question Answering0
Foundation Posteriors for Approximate Probabilistic Inference0
FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model0
Founder-GPT: Self-play to evaluate the Founder-Idea fit0
FPGA-Based Low-Power Speech Recognition with Recurrent Neural Networks0
FPM: A Collection of Large-scale Foundation Pre-trained Language Models0
FQuAD2.0: French Question Answering and knowing that you know nothing0
Show:102550
← PrevPage 145 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified