SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 61016150 of 17610 papers

TitleStatusHype
Applying GPGPU to Recurrent Neural Network Language Model based Fast Network Search in the Real-Time LVCSR0
Applying Pairwise Ranked Optimisation to Improve the Interpolation of Translation Models0
Applying Refusal-Vector Ablation to Llama 3.1 70B Agents0
Applying Sanskrit Concepts for Reordering in MT0
Applying SoftTriple Loss for Supervised Language Model Fine Tuning0
Applying SoftTriple Loss for Supervised Language Model Fine Tuning0
Applying Statistical Post-Editing to English-to-Korean Rule-based Machine Translation System0
Applying wav2vec2 for Speech Recognition on Bengali Common Voices Dataset0
Approaches to Improving Recognition of Underrepresented Named Entities in Hybrid ASR Systems0
Approximated Variational Bayesian Inverse Reinforcement Learning for Large Language Model Alignment0
Approximate Sentence Retrieval for Scalable and Efficient Example-Based Machine Translation0
Approximating mutual information of high-dimensional variables using learned representations0
AppVLM: A Lightweight Vision Language Model for Online App Control0
A Practical Examination of AI-Generated Text Detectors for Large Language Models0
A practical framework for multi-domain speech recognition and an instance sampling method to neural language modeling0
A practical perspective on connective generation0
A Predictive Model of Digital Information Engagement: Forecasting User Engagement With English Words by Incorporating Cognitive Biases, Computational Linguistics and Natural Language Processing0
A Pre-training Strategy for Zero-Resource Response Selection in Knowledge-Grounded Conversations0
A Primer on Contrastive Pretraining in Language Processing: Methods, Lessons Learned and Perspectives0
A Principled Approach to Context-Aware Machine Translation0
A Principled Framework for Knowledge-enhanced Large Language Model0
A Progressive Framework of Vision-language Knowledge Distillation and Alignment for Multilingual Scene0
A Progressive Transformer for Unifying Binary Code Embedding and Knowledge Transfer0
A Prompt Engineering Approach and a Knowledge Graph based Framework for Tackling Legal Implications of Large Language Model Answers0
A Prompt Refinement-based Large Language Model for Metro Passenger Flow Forecasting under Delay Conditions0
A Proposal of Automatic Error Correction in Text0
A Proposed Large Language Model-Based Smart Search for Archive System0
A Proposition-Based Abstractive Summariser0
A Protein Structure Prediction Approach Leveraging Transformer and CNN Integration0
A Provably Correct Learning Algorithm for Latent-Variable PCFGs0
Aptly: Making Mobile Apps from Natural Language0
A Quantitative Analysis of Comparison of Emoji Sentiment: Taiwan Mandarin Users and English Users0
A Quantitative Review on Language Model Efficiency Research0
Aquila: A Hierarchically Aligned Visual-Language Model for Enhanced Remote Sensing Image Comprehension0
Aquila-plus: Prompt-Driven Visual-Language Models for Pixel-Level Remote Sensing Image Understanding0
ArabianGPT: Native Arabic GPT-based Large Language Model0
Arabic Compact Language Modelling for Resource Limited Devices0
Arabic Diacritization with Recurrent Neural Networks0
Arabic Dialect Identification for Travel and Twitter Text0
Arabic Word Generation and Modelling for Spell Checking0
Arabizi Detection and Conversion to Arabic0
Arabizi Language Models for Sentiment Analysis0
AraLegal-BERT: A pretrained language model for Arabic Legal text0
A random forest system combination approach for error detection in digital dictionaries0
A Random Gossip BMUF Process for Neural Language Modeling0
AraPoemBERT: A Pretrained Language Model for Arabic Poetry Analysis0
ArbDialectID at MADAR Shared Task 1: Language Modelling and Ensemble Learning for Fine Grained Arabic Dialect Identification0
ARChef: An iOS-Based Augmented Reality Cooking Assistant Powered by Multimodal Gemini LLM0
Architect: Generating Vivid and Interactive 3D Scenes with Hierarchical 2D Inpainting0
Architectural Complexity Measures of Recurrent Neural Networks0
Show:102550
← PrevPage 123 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified