Summarization Corpora of Wikipedia Articles

2020-05-01LREC 2020Code Available0· sign in to hype

Dominik Frefel

Code Available — Be the first to reproduce this paper.

Code

github.com/domfr/gewiki
OfficialIn paper★ 2

Abstract

In this paper we propose a process to extract summarization corpora from Wikipedia articles. Applied to the German language we create a corpus of 240,000 texts. We use ROUGE scores for the extraction and evaluation of our corpus. For this we provide a ROUGE metric implementation adapted to the German language. The extracted corpus is used to train three abstractive summarization models which we compare to different baselines. The resulting summaries sound natural and cover the input text very well. The corpus can be downloaded at https://github.com/domfr/GeWiki.

Tasks

Abstractive Text Summarization Articles

Summarization Corpora of Wikipedia Articles

Code

Abstract

Tasks

Reproductions