| Long-Short Transformer: Efficient Transformers for Language and Vision | Jul 5, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Robust End-to-End Offline Chinese Handwriting Text Page Spotter with Text Kernel | Jul 4, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling | Jul 2, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| XLM-E: Cross-lingual Language Model Pre-training via ELECTRA | Jun 30, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information | Jun 30, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| R-Drop: Regularized Dropout for Neural Networks | Jun 28, 2021 | Abstractive Text Summarizationimage-classification | CodeCode Available | 1 |
| Stabilizing Equilibrium Models by Jacobian Regularization | Jun 28, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| SymbolicGPT: A Generative Transformer Model for Symbolic Regression | Jun 27, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| CLIP2Video: Mastering Video-Text Retrieval via Image CLIP | Jun 21, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models | Jun 18, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Golos: Russian Dataset for Speech Research | Jun 18, 2021 | Automatic Speech Recognition (ASR)Language Modeling | CodeCode Available | 1 |
| SPBERT: An Efficient Pre-training BERT on SPARQL Queries for Question Answering over Knowledge Graphs | Jun 18, 2021 | DecoderKnowledge Graphs | CodeCode Available | 1 |
| Distributed Deep Learning in Open Collaborations | Jun 18, 2021 | Deep LearningLanguage Modeling | CodeCode Available | 1 |
| Scene Transformer: A unified architecture for predicting multiple agent trajectories | Jun 15, 2021 | Autonomous DrivingLanguage Modeling | CodeCode Available | 1 |
| Direction is what you need: Improving Word Embedding Compression in Large Language Models | Jun 15, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Incorporating External POS Tagger for Punctuation Restoration | Jun 12, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| BioELECTRA:Pretrained Biomedical text Encoder using Discriminators | Jun 11, 2021 | ArticlesLanguage Modeling | CodeCode Available | 1 |
| Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment | Jun 11, 2021 | DenoisingLanguage Modeling | CodeCode Available | 1 |
| Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models | Jun 10, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Ultra-Fine Entity Typing with Weak Supervision from a Masked Language Model | Jun 8, 2021 | Entity TypingLanguage Modeling | CodeCode Available | 1 |
| Staircase Attention for Recurrent Processing of Sequences | Jun 8, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks | Jun 8, 2021 | Domain GeneralizationLanguage Modeling | CodeCode Available | 1 |
| Top-KAST: Top-K Always Sparse Training | Jun 7, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators | Jun 4, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Luna: Linear Unified Nested Attention | Jun 3, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 |