SOTAVerified

Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models

2024-01-01CVPR 2024Code Available3· sign in to hype

Chang Liu, HaoNing Wu, Yujie Zhong, Xiaoyun Zhang, Yanfeng Wang, Weidi Xie

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Generative models have recently exhibited exceptional capabilities in text-to-image generation but still struggle to generate image sequences coherently. In this work we focus on a novel yet challenging task of generating a coherent image sequence based on a given storyline denoted as open-ended visual storytelling. We make the following three contributions: (i) to fulfill the task of visual storytelling we propose a learning-based auto-regressive image generation model termed as StoryGen with a novel vision-language context module that enables to generate the current frame by conditioning on the corresponding text prompt and preceding image-caption pairs; (ii) to address the data shortage of visual storytelling we collect paired image-text sequences by sourcing from online videos and open-source E-books establishing processing pipeline for constructing a large-scale dataset with diverse characters storylines and artistic styles named StorySalon; (iii) Quantitative experiments and human evaluations have validated the superiority of our StoryGen where we show it can generalize to unseen characters without any optimization and generate image sequences with coherent content and consistent character. Code dataset and models are available at https://haoningwu3639.github.io/StoryGen_Webpage/

Tasks

Reproductions