SOTAVerified

LAMBADA

Papers

Showing 130 of 30 papers

TitleStatusHype
Training Compute-Optimal Large Language ModelsCode6
Spectra: Surprising Effectiveness of Pretraining Ternary Language Models at ScaleCode2
Scaling Language Models: Methods, Analysis & Insights from Training GopherCode2
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model ParallelismCode2
Residual Shuffle-Exchange Networks for Fast Processing of Long SequencesCode1
Explaining and Improving Contrastive Decoding by Extrapolating the Probabilities of a Huge and Hypothetical LMCode1
The LAMBADA dataset: Word prediction requiring a broad discourse contextCode1
Beyond Autoregression: Fast LLMs via Self-Distillation Through TimeCode1
Leveraging Relaxed Equilibrium by Lazy Transition for Sequence Modeling0
Linguistic Knowledge as Memory for Recurrent Neural Networks0
Matryoshka Model Learning for Improved Elastic Student Models0
Neural Models for Reasoning over Multiple Mentions using Coreference0
AdaGC: Improving Training Stability for Large Language Model Pretraining0
Attending to Entities for Better Text Understanding0
Broad Context Language Modeling as Reading Comprehension0
Concise and Organized Perception Facilitates Reasoning in Large Language Models0
CoreLM: Coreference-aware Language Model Fine-Tuning0
E.T.: Entity-Transformers. Coreference augmented Neural Language Model for richer mention representations via Entity-Transformer blocks0
Headless Language Models: Learning without Predicting with Contrastive Weight Tying0
LAMBADA: Backward Chaining for Automated Reasoning in Natural Language0
PIXAR: Auto-Regressive Language Modeling in Pixel Space0
Stay on topic with Classifier-Free Guidance0
SymBa: Symbolic Backward Chaining for Structured Natural Language Reasoning0
Neural Shuffle-Exchange Networks - Sequence Processing in O(n log n) TimeCode0
Neural Shuffle-Exchange Networks -- Sequence Processing in O(n log n) TimeCode0
The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT ModelsCode0
Universal TransformersCode0
Inconsistencies in Masked Language ModelsCode0
Entity Tracking Improves Cloze-style Reading ComprehensionCode0
Not Enough Data? Deep Learning to the Rescue!Code0
Show:102550

No leaderboard results yet.