SOTAVerified

Russian-Language Multimodal Dataset for Automatic Summarization of Scientific Papers

2024-05-13Code Available0· sign in to hype

Alena Tsanda, Elena Bruches

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

The paper discusses the creation of a multimodal dataset of Russian-language scientific papers and testing of existing language models for the task of automatic text summarization. A feature of the dataset is its multimodal data, which includes texts, tables and figures. The paper presents the results of experiments with two language models: Gigachat from SBER and YandexGPT from Yandex. The dataset consists of 420 papers and is publicly available on https://github.com/iis-research-team/summarization-dataset.

Tasks

Reproductions