SOTAVerified

LAMBADA

Papers

Showing 130 of 30 papers

TitleStatusHype
Training Compute-Optimal Large Language ModelsCode6
Spectra: Surprising Effectiveness of Pretraining Ternary Language Models at ScaleCode2
Scaling Language Models: Methods, Analysis & Insights from Training GopherCode2
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model ParallelismCode2
Beyond Autoregression: Fast LLMs via Self-Distillation Through TimeCode1
Explaining and Improving Contrastive Decoding by Extrapolating the Probabilities of a Huge and Hypothetical LMCode1
The LAMBADA dataset: Word prediction requiring a broad discourse contextCode1
Residual Shuffle-Exchange Networks for Fast Processing of Long SequencesCode1
Universal TransformersCode0
The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT ModelsCode0
Entity Tracking Improves Cloze-style Reading ComprehensionCode0
Neural Shuffle-Exchange Networks -- Sequence Processing in O(n log n) TimeCode0
Neural Shuffle-Exchange Networks - Sequence Processing in O(n log n) TimeCode0
Not Enough Data? Deep Learning to the Rescue!Code0
Inconsistencies in Masked Language ModelsCode0
Headless Language Models: Learning without Predicting with Contrastive Weight Tying0
Neural Models for Reasoning over Multiple Mentions using Coreference0
SymBa: Symbolic Backward Chaining for Structured Natural Language Reasoning0
Broad Context Language Modeling as Reading Comprehension0
Attending to Entities for Better Text Understanding0
AdaGC: Improving Training Stability for Large Language Model Pretraining0
PIXAR: Auto-Regressive Language Modeling in Pixel Space0
E.T.: Entity-Transformers. Coreference augmented Neural Language Model for richer mention representations via Entity-Transformer blocks0
CoreLM: Coreference-aware Language Model Fine-Tuning0
Concise and Organized Perception Facilitates Reasoning in Large Language Models0
Stay on topic with Classifier-Free Guidance0
LAMBADA: Backward Chaining for Automated Reasoning in Natural Language0
Leveraging Relaxed Equilibrium by Lazy Transition for Sequence Modeling0
Linguistic Knowledge as Memory for Recurrent Neural Networks0
Matryoshka Model Learning for Improved Elastic Student Models0
Show:102550

No leaderboard results yet.