SOTAVerified

One General Teacher for Multi-Data Multi-Task: A New Knowledge Distillation Framework for Discourse Relation Analysis

2021-11-16ACL ARR November 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Automatically identifying the discourse relations can help many downstream NLP tasks such as reading comprehension. It can be categorized into explicit and implicit discourse relation recognition (EDRR and IDRR). Due to the lack of connectives, IDRR remains to be a big challenge. A good number of methods have been developed to combine explicit data with implicit ones under the multi-task learning framework. However, the difference in linguistic property and class distribution makes it hard to directly optimize EDRR and IDRR with multi-task learning. In this paper, we take the first step to exploit the knowledge distillation (KD) technique for discourse relation analysis. Our target is to train a focused single-data single-task student with the help of a general multi-data multi-task teacher. Specifically, we first train one teacher for both the top and second level relation classification tasks with explicit and implicit data. We then transfer the feature embeddings and soft labels from the teacher network to the student network. Extensive experimental results on the popular PDTB dataset proves that our model achieves a new state-of-the-art performance. We also show the effectiveness of our proposed KD architecture through detailed analysis.

Tasks

Reproductions