SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 70517100 of 17610 papers

TitleStatusHype
Fast and Robust Neural Network Joint Models for Statistical Machine Translation0
Fast and Robust Unsupervised Contextual Biasing for Speech Recognition0
Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation0
Fast Collocation-Based Bayesian HMM Word Alignment0
Fast Contextual Adaptation with Neural Associative Memory for On-Device Personalized Speech Recognition0
Fast-ELECTRA for Efficient Pre-training0
Fast End-to-End Speech Recognition via Non-Autoregressive Models and Cross-Modal Knowledge Transferring from BERT0
Faster Adaptive Federated Learning0
Faster, Cheaper, Better: Multi-Objective Hyperparameter Optimization for LLM and RAG Systems0
Faster Phrase-Based Decoding by Refining Feature State0
Fast Gated Neural Domain Adaptation: Language Model as a Case Study0
FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network0
FastLR: Non-Autoregressive Lipreading Model with Integrate-and-Fire0
Fast Parametric Learning with Activation Memorization0
Fast Quantum Algorithm for Attention Computation0
Fast-Slow Thinking for Large Vision-Language Model Reasoning0
Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining0
Fast Text-Only Domain Adaptation of RNN-Transducer Prediction Network0
FastWave: Accelerating Autoregressive Convolutional Neural Networks on FPGA0
FA Team at the NTCIR-17 UFO Task0
Fault Diagnosis in Power Grids with Large Language Model0
FBK-UPV-UEdin participation in the WMT14 Quality Estimation shared-task0
Towards Better Few-Shot and Finetuning Performance with Forgetful Causal Language Models0
FD-LLM: Large Language Model for Fault Diagnosis of Machines0
Feasibility of BERT Embeddings For Domain-Specific Knowledge Mining0
Feasibility with Language Models for Open-World Compositional Zero-Shot Learning0
Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration0
Feature Aggregation in Zero-Shot Cross-Lingual Transfer Using Multilingual BERT0
Feature Alignment and Representation Transfer in Knowledge Distillation for Large Language Models0
Feature-based Neural Language Model and Chinese Word Segmentation0
Feature Decay Algorithms for Fast Deployment of Accurate Statistical Machine Translation Systems0
Feature Engineering vs BERT on Twitter Data0
Feature Extraction for Native Language Identification Using Language Modeling0
Feature Fusion Effects of Tensor Product Representation on (De)Compositional Network for Caption Generation for Images0
Feature-Learning Networks Are Consistent Across Widths At Realistic Scales0
Feature Optimization for Predicting Readability of Arabic L1 and L20
FedBaF: Federated Learning Aggregation Biased by a Foundation Model0
FedBoost: A Communication-Efficient Algorithm for Federated Learning0
Federated Cross-Domain Click-Through Rate Prediction With Large Language Model Augmentation0
Federated Evaluation of On-device Personalization0
Integration of Large Language Models and Federated Learning0
Federated Learning for Emoji Prediction in a Mobile Keyboard0
Federated Learning of N-gram Language Models0
Federated Learning for Personalized Humor Recognition0
Federated Reinforcement Learning with Constraint Heterogeneity0
FedMKGC: Privacy-Preserving Federated Multilingual Knowledge Graph Completion0
FedTherapist: Mental Health Monitoring with User-Generated Linguistic Expressions on Smartphones via Federated Learning0
FedTLU: Federated Learning with Targeted Layer Updates0
FedTune: A Deep Dive into Efficient Federated Fine-Tuning with Pre-trained Transformers0
Feedforward Sequential Memory Networks: A New Structure to Learn Long-term Dependency0
Show:102550
← PrevPage 142 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified