ContextualStory: Consistent Visual Storytelling with Spatially-Enhanced and Storyline Context

2024-07-13Code Available0· sign in to hype

Sixiao Zheng, Yanwei Fu

Code Available — Be the first to reproduce this paper.

Code

github.com/sixiaozheng/contextualstory
OfficialIn paperpytorch★ 3

Abstract

Visual storytelling involves generating a sequence of coherent frames from a textual storyline while maintaining consistency in characters and scenes. Existing autoregressive methods, which rely on previous frame-sentence pairs, struggle with high memory usage, slow generation speeds, and limited context integration. To address these issues, we propose ContextualStory, a novel framework designed to generate coherent story frames and extend frames for visual storytelling. ContextualStory utilizes Spatially-Enhanced Temporal Attention to capture spatial and temporal dependencies, handling significant character movements effectively. Additionally, we introduce a Storyline Contextualizer to enrich context in storyline embedding, and a StoryFlow Adapter to measure scene changes between frames for guiding the model. Extensive experiments on PororoSV and FlintstonesSV datasets demonstrate that ContextualStory significantly outperforms existing SOTA methods in both story visualization and continuation. Code is available at https://github.com/sixiaozheng/ContextualStory.

Tasks

Image Generation Story Continuation Story Visualization Text-to-Image Generation Visual Storytelling

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
FlintstonesSV	ContextualStory	FID	16.33	—	Unverified
PororoSV	ContextualStory	FID	14.2	—	Unverified

ContextualStory: Consistent Visual Storytelling with Spatially-Enhanced and Storyline Context

Code

Abstract

Tasks

Benchmark Results

Reproductions