SOTAVerified

LordBERT: Embedding Long Text by Segment Ordering with BERT

2021-11-16ACL ARR November 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Although BERT has achieved significant improvements on many downstream NLP tasks, it has difficulty handling long text because of its quadratic computation complexity. A typical approach to this issue is splitting the input into shorter segments and utilizing order-independent attention mechanism to conduct inter-segment interaction, but the approach ignores the segment order information, which is greatly beneficial for capturing implicit relations across different segments. To address this problem, we propose a novel multi-task learning framework, named LordBERT, which fully exploits both intra- and inter-segment information in long text by segment ordering with BERT. LordBERT learns segment-level representations from segments through BERT and a reasoner, and utilizes an auxiliary segment ordering module to reorder disordered segments. With this module, the model implicitly encodes inter-segment relations and global information of long text into segment representations. The downstream task and the ordering task are jointly optimized during training, while for inferencing we mainly conduct the downstream task. Experimental results show that LordBERT outperforms the state-of-the-art models by up to 0.94% in accuracy for text classification tasks on long text.

Tasks

Reproductions