| RoBERTa: A Robustly Optimized BERT Pretraining Approach | Jul 26, 2019 | Common Sense ReasoningDocument Image Classification | CodeCode Available | 1 |
| ELI5: Long Form Question Answering | Jul 22, 2019 | FormLanguage Modeling | CodeCode Available | 1 |
| Hello, It's GPT-2 -- How Can I Help You? Towards the Use of Pretrained Language Models for Task-Oriented Dialogue Systems | Jul 12, 2019 | Decision MakingLanguage Modeling | CodeCode Available | 1 |
| Evaluating Language Model Finetuning Techniques for Low-resource Languages | Jun 30, 2019 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| A Tensorized Transformer for Language Modeling | Jun 24, 2019 | DecoderLanguage Modeling | CodeCode Available | 1 |
| XLNet: Generalized Autoregressive Pretraining for Language Understanding | Jun 19, 2019 | Audio Question AnsweringChinese Reading Comprehension | CodeCode Available | 1 |
| How multilingual is Multilingual BERT? | Jun 4, 2019 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Does It Make Sense? And Why? A Pilot Study for Sense Making and Explanation | Jun 2, 2019 | Common Sense ReasoningLanguage Modeling | CodeCode Available | 1 |
| Adapting Text Embeddings for Causal Inference | May 29, 2019 | Causal IdentificationCausal Inference | CodeCode Available | 1 |
| Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks | May 27, 2019 | General Classificationimage-classification | CodeCode Available | 1 |
| Discrete Flows: Invertible Generative Models of Discrete Data | May 24, 2019 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Adaptive Attention Span in Transformers | May 19, 2019 | 8kLanguage Modeling | CodeCode Available | 1 |
| A Surprisingly Robust Trick for Winograd Schema Challenge | May 15, 2019 | Common Sense ReasoningCoreference Resolution | CodeCode Available | 1 |
| How to Fine-Tune BERT for Text Classification? | May 14, 2019 | General ClassificationLanguage Modeling | CodeCode Available | 1 |
| RWTH ASR Systems for LibriSpeech: Hybrid vs Attention -- w/o Data Augmentation | May 8, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| The Curious Case of Neural Text Degeneration | Apr 22, 2019 | DiversityLanguage Modeling | CodeCode Available | 1 |
| Mask-Predict: Parallel Decoding of Conditional Masked Language Models | Apr 19, 2019 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition | Apr 18, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| fairseq: A Fast, Extensible Toolkit for Sequence Modeling | Apr 1, 2019 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| SciBERT: A Pretrained Language Model for Scientific Text | Mar 26, 2019 | Citation Intent ClassificationDependency Parsing | CodeCode Available | 1 |
| A Fully Differentiable Beam Search Decoder | Feb 16, 2019 | DecoderLanguage Modeling | CodeCode Available | 1 |
| Language Models are Unsupervised Multitask Learners | Feb 14, 2019 | Common Sense ReasoningCoreference Resolution | CodeCode Available | 1 |
| Pay Less Attention with Lightweight and Dynamic Convolutions | Jan 29, 2019 | Abstractive Text SummarizationLanguage Modeling | CodeCode Available | 1 |
| Passage Re-ranking with BERT | Jan 13, 2019 | Language ModelingPassage Re-Ranking | CodeCode Available | 1 |
| Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context | Jan 9, 2019 | ArticlesLanguage Modeling | CodeCode Available | 1 |