SOTAVerified

Language Modelling

A language model is a model of natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using words scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

Source: Wikipedia

Papers

Showing 40014050 of 17610 papers

TitleStatusHype
An Embarrassingly Simple Method to Mitigate Undesirable Properties of Pretrained Language Model TokenizersCode1
DeepInception: Hypnotize Large Language Model to Be JailbreakerCode1
Mitigating Fine-tuning based Jailbreak Attack with Backdoor Enhanced Safety AlignmentCode1
BEAR: A Unified Framework for Evaluating Relational Knowledge in Causal and Masked Language ModelsCode1
XG-NID: Dual-Modality Network Intrusion Detection using a Heterogeneous Graph Neural Network and Large Language ModelCode1
Measuring Implicit Bias in Explicitly Unbiased Large Language ModelsCode1
A Cheaper and Better Diffusion Language Model with Soft-Masked NoiseCode1
MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation SystemsCode1
DeeperImpact: Optimizing Sparse Learned Index StructuresCode1
XLM-K: Improving Cross-Lingual Language Model Pre-training with Multilingual KnowledgeCode1
Mitigating Hallucination for Large Vision Language Model by Inter-Modality Correlation Calibration DecodingCode1
Mixture of Attention Heads: Selecting Attention Heads Per TokenCode1
An Efficient Self-Supervised Cross-View Training For Sentence EmbeddingCode1
MiniGPT-Pancreas: Multimodal Large Language Model for Pancreas Cancer Classification and DetectionCode1
End-to-End Beam Retrieval for Multi-Hop Question AnsweringCode1
An Efficient Multilingual Language Model Compression through Vocabulary TrimmingCode1
BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling Capacities of Large Language ModelsCode1
Deep contextualized word representationsCode1
Decrypting Cryptic Crosswords: Semantically Complex Wordplay Puzzles as a Target for NLPCode1
Dataflow Analysis-Inspired Deep Learning for Efficient Vulnerability DetectionCode1
Decoding-Time Language Model Alignment with Multiple ObjectivesCode1
Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU HeterogeneityCode1
Decoding Speculative DecodingCode1
Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image SequencesCode1
Decoupled Visual Interpretation and Linguistic Reasoning for Math Problem SolvingCode1
DUMA: Reading Comprehension with Transposition ThinkingCode1
Dependency Transformer Grammars: Integrating Dependency Structures into Transformer Language ModelsCode1
Decouple knowledge from parameters for plug-and-play language modelingCode1
BanglaNLG and BanglaT5: Benchmarks and Resources for Evaluating Low-Resource Natural Language Generation in BanglaCode1
Debiasing the Cloze Task in Sequential Recommendation with Bidirectional TransformersCode1
Deep Equilibrium ModelsCode1
DiLM: Distilling Dataset into Language Model for Text-level Dataset DistillationCode1
Memory-Based Model Editing at ScaleCode1
MetaICL: Learning to Learn In ContextCode1
Modelling Suspense in Short Stories as Uncertainty Reduction over Neural RepresentationCode1
Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMsCode1
Language Model Alignment in Multilingual Trolley ProblemsCode1
ZC3: Zero-Shot Cross-Language Code Clone DetectionCode1
On Faithfulness and Factuality in Abstractive SummarizationCode1
Merging Feed-Forward Sublayers for Compressed TransformersCode1
Pre-Trained Language Models Augmented with Synthetic Scanpaths for Natural Language UnderstandingCode1
Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in TransformersCode1
An Effective Deployment of Diffusion LM for Data Augmentation in Low-Resource Sentiment ClassificationCode0
Bayesian Parameter-Efficient Fine-Tuning for Overcoming Catastrophic ForgettingCode0
MILL: Mutual Verification with Large Language Models for Zero-Shot Query ExpansionCode0
Bayesian Neural Network Language Modeling for Speech RecognitionCode0
An EcoSage Assistant: Towards Building A Multimodal Plant Care Dialogue AssistantCode0
MIMO: A Medical Vision Language Model with Visual Referring Multimodal Input and Pixel Grounding Multimodal OutputCode0
Improving the Data-efficiency of Reinforcement Learning by Warm-starting with LLMCode0
Logit Separability-Driven Samples and Multiple Class-Related Words Selection for Advancing In-Context LearningCode0
Show:102550
← PrevPage 81 of 353Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Decay RNNValidation perplexity76.67Unverified
2GRUValidation perplexity53.78Unverified
3LSTMValidation perplexity52.73Unverified
4LSTMTest perplexity48.7Unverified
5Temporal CNNTest perplexity45.2Unverified
6TCNTest perplexity45.19Unverified
7GCNN-8Test perplexity44.9Unverified
8Neural cache model (size = 100)Test perplexity44.8Unverified
9Neural cache model (size = 2,000)Test perplexity40.8Unverified
10GPT-2 SmallTest perplexity37.5Unverified
#ModelMetricClaimedVerifiedStatus
1TCNTest perplexity108.47Unverified
2Seq-U-NetTest perplexity107.95Unverified
3GRU (Bai et al., 2018)Test perplexity92.48Unverified
4R-TransformerTest perplexity84.38Unverified
5Zaremba et al. (2014) - LSTM (medium)Test perplexity82.7Unverified
6Gal & Ghahramani (2016) - Variational LSTM (medium)Test perplexity79.7Unverified
7LSTM (Bai et al., 2018)Test perplexity78.93Unverified
8Zaremba et al. (2014) - LSTM (large)Test perplexity78.4Unverified
9Gal & Ghahramani (2016) - Variational LSTM (large)Test perplexity75.2Unverified
10Inan et al. (2016) - Variational RHNTest perplexity66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM (7 layers)Bit per Character (BPC)1.67Unverified
2HypernetworksBit per Character (BPC)1.34Unverified
3SHA-LSTM (4 layers, h=1024, no attention head)Bit per Character (BPC)1.33Unverified
4LN HM-LSTMBit per Character (BPC)1.32Unverified
5ByteNetBit per Character (BPC)1.31Unverified
6Recurrent Highway NetworksBit per Character (BPC)1.27Unverified
7Large FS-LSTM-4Bit per Character (BPC)1.25Unverified
8Large mLSTMBit per Character (BPC)1.24Unverified
9AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
10Cluster-Former (#C=512)Bit per Character (BPC)1.22Unverified
#ModelMetricClaimedVerifiedStatus
1Smaller Transformer 126M (pre-trained)Test perplexity33Unverified
2OPT 125MTest perplexity32.26Unverified
3Larger Transformer 771M (pre-trained)Test perplexity28.1Unverified
4OPT 1.3BTest perplexity19.55Unverified
5GPT-Neo 125MTest perplexity17.83Unverified
6OPT 2.7BTest perplexity17.81Unverified
7Smaller Transformer 126M (fine-tuned)Test perplexity12Unverified
8GPT-Neo 1.3BTest perplexity11.46Unverified
9Transformer 125MTest perplexity10.7Unverified
10GPT-Neo 2.7BTest perplexity10.44Unverified