Russian-Language Multimodal Dataset for Automatic Summarization of Scientific Papers
2024-05-13Code Available0· sign in to hype
Alena Tsanda, Elena Bruches
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/iis-research-team/summarization-datasetOfficialIn papernone★ 8
Abstract
The paper discusses the creation of a multimodal dataset of Russian-language scientific papers and testing of existing language models for the task of automatic text summarization. A feature of the dataset is its multimodal data, which includes texts, tables and figures. The paper presents the results of experiments with two language models: Gigachat from SBER and YandexGPT from Yandex. The dataset consists of 420 papers and is publicly available on https://github.com/iis-research-team/summarization-dataset.