SOTAVerified

Language Modeling

Papers

Showing 31013150 of 14182 papers

TitleStatusHype
Montage: A Neural Network Language Model-Guided JavaScript Engine FuzzerCode1
Improving Transformer Optimization Through Better InitializationCode1
Improving Transformer Optimization Through Better InitializationCode1
BERTje: A Dutch BERT ModelCode1
Open Domain Web Keyphrase Extraction Beyond Language ModelingCode1
Unsupervised Cross-lingual Representation Learning at ScaleCode1
Generalization through Memorization: Nearest Neighbor Language ModelsCode1
Masked Language Model ScoringCode1
Multi-Stage Document Ranking with BERTCode1
Stabilizing Transformers for Reinforcement LearningCode1
Structured Pruning of Large Language ModelsCode1
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighterCode1
Reducing Transformer Depth on Demand with Structured DropoutCode1
UNITER: UNiversal Image-TExt Representation LearningCode1
A Critical Analysis of Biased Parsers in Unsupervised ParsingCode1
Espresso: A Fast End-to-end Neural Speech Recognition ToolkitCode1
Ouroboros: On Accelerating Training of Transformer-Based Language ModelsCode1
CTRL: A Conditional Transformer Language Model for Controllable GenerationCode1
MultiFiT: Efficient Multi-lingual Language Model Fine-tuningCode1
Improved Hierarchical Patient Classification with Language Model Pretraining over Clinical NotesCode1
The Woman Worked as a Babysitter: On Biases in Language GenerationCode1
Deep Equilibrium ModelsCode1
LXMERT: Learning Cross-Modality Encoder Representations from TransformersCode1
VisualBERT: A Simple and Performant Baseline for Vision and LanguageCode1
On the Variance of the Adaptive Learning Rate and BeyondCode1
RoBERTa: A Robustly Optimized BERT Pretraining ApproachCode1
ELI5: Long Form Question AnsweringCode1
Hello, It's GPT-2 -- How Can I Help You? Towards the Use of Pretrained Language Models for Task-Oriented Dialogue SystemsCode1
Evaluating Language Model Finetuning Techniques for Low-resource LanguagesCode1
A Tensorized Transformer for Language ModelingCode1
XLNet: Generalized Autoregressive Pretraining for Language UnderstandingCode1
How multilingual is Multilingual BERT?Code1
Does It Make Sense? And Why? A Pilot Study for Sense Making and ExplanationCode1
Adapting Text Embeddings for Causal InferenceCode1
Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep NetworksCode1
Discrete Flows: Invertible Generative Models of Discrete DataCode1
Adaptive Attention Span in TransformersCode1
A Surprisingly Robust Trick for Winograd Schema ChallengeCode1
How to Fine-Tune BERT for Text Classification?Code1
RWTH ASR Systems for LibriSpeech: Hybrid vs Attention -- w/o Data AugmentationCode1
The Curious Case of Neural Text DegenerationCode1
Mask-Predict: Parallel Decoding of Conditional Masked Language ModelsCode1
SpecAugment: A Simple Data Augmentation Method for Automatic Speech RecognitionCode1
fairseq: A Fast, Extensible Toolkit for Sequence ModelingCode1
SciBERT: A Pretrained Language Model for Scientific TextCode1
A Fully Differentiable Beam Search DecoderCode1
Language Models are Unsupervised Multitask LearnersCode1
Pay Less Attention with Lightweight and Dynamic ConvolutionsCode1
Passage Re-ranking with BERTCode1
Transformer-XL: Attentive Language Models Beyond a Fixed-Length ContextCode1
Show:102550
← PrevPage 63 of 284Next →

No leaderboard results yet.