SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 94019450 of 17610 papers

TitleStatusHype
ATTNChecker: Highly-Optimized Fault Tolerant Attention for Large Language Model Training0
Lightweight Large Language Model for Medication Enquiry: Med-Pal0
Lightweight Neural App Control0
Lightweight Unsupervised Federated Learning with Pretrained Vision Language Model0
Like a bilingual baby: The advantage of visually grounding a bilingual language model0
Likelihood Variance as Text Importance for Resampling Texts to Map Language Models0
LiLiuM: eBay's Large Language Models for e-commerce0
LiLM-RDB-SFC: Lightweight Language Model with Relational Database-Guided DRL for Optimized SFC Provisioning0
LiMe: a Latin Corpus of Late Medieval Criminal Sentences0
Limits of Detecting Text Generated by Large-Scale Language Models0
LIMSIILES: Basic English Substitution for Student Answer Assessment at SemEval 20130
LIMSI@IWSLT’16: MT Track0
LIMSI @ WMT130
LIMSI @ WMT'14 Medical Translation Task0
LIMSI@WMT'170
Linear Attention via Orthogonal Memory0
Linearizing Transformer with Key-Value Memory0
Generation of 3D Molecules in Pockets via Language Model0
Linguacodus: A Synergistic Framework for Transformative Code Generation in Machine Learning Pipelines0
LinguAlchemy: Fusing Typological and Geographical Elements for Unseen Language Generalization0
LinguaLinked: A Distributed Large Language Model Inference System for Mobile Devices0
Lingua Manga: A Generic Large Language Model Centric System for Data Curation0
Linguistically Informed ChatGPT Prompts to Enhance Japanese-Chinese Machine Translation: A Case Study on Attributive Clauses0
Linguistically Inspired Language Model Augmentation for MT0
Linguistic Analysis Processing Line for Bulgarian0
Linguistic-Enhanced Transformer with CTC Embedding for Speech Recognition0
Linguistic Entity Masking to Improve Cross-Lingual Representation of Multilingual Language Models for Low-Resource Languages0
Linguistic Knowledge and Transferability of Contextual Representations0
Linguistic Profiling of a Neural Language Model0
Linguistic Regularities in Continuous Space Word Representations0
Linguistic Structured Sparsity in Text Categorization0
LINGUIST: Language Model Instruction Tuning to Generate Annotated Utterances for Intent Classification and Slot Tagging0
Link Prediction via Graph Attention Network0
LINKs: Large Language Model Integrated Management for 6G Empowered Digital Twin NetworKs0
[Lions: 1] and [Tigers: 2] and [Bears: 3], Oh My! Literary Coreference Annotation with LLMs0
LipidBERT: A Lipid Language Model Pre-trained on METiS de novo Lipid Library0
Listen, Attend, Spell and Adapt: Speaker Adapted Sequence-to-Sequence ASR0
Listen, Chat, and Remix: Text-Guided Soundscape Remixing for Enhanced Auditory Experience0
Listen to the Context: Towards Faithful Large Language Models for Retrieval Augmented Generation on Climate Questions0
LiteVL: Efficient Video-Language Learning with Enhanced Spatial-Temporal Modeling0
LiteVLM: A Low-Latency Vision-Language Model Inference Pipeline for Resource-Constrained Environments0
LIT: Large Language Model Driven Intention Tracking for Proactive Human-Robot Collaboration -- A Robot Sous-Chef Application0
LittleBird: Efficient Faster & Longer Transformer for Question Answering0
LIUM's SMT Machine Translation Systems for WMT 20120
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale0
Lizard: An Efficient Linearization Framework for Large Language Models0
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models0
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning0
Llama-3.1-Sherkala-8B-Chat: An Open Large Language Model for Kazakh0
LLaMA based Punctuation Restoration With Forward Pass Only Decoding0
Show:102550
← PrevPage 189 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified