SAGE: Structure-Aware Generative Video Transitions between Diverse Clips
Mia Kan, Yilin Liu, Niloy Mitra
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Video transitions aim to synthesize intermediate frames between two clips, but naive approaches such as linear blending introduce artifacts that limit professional use or break temporal coherence. Traditional techniques (cross-fades, morphing, frame interpolation) and recent generative inbetweening methods can produce high-quality plausible intermediates, but they struggle with bridging diverse clips involving large temporal gaps or significant semantic differences, leaving a gap for content-aware and visually coherent transitions. We address this challenge by drawing on artistic workflows, distilling strategies such as aligning silhouettes and interpolating salient features to preserve structure and perceptual continuity. Building on these strategies, we propose SAGE (Structure-Aware Generative vidEo transitions) as a simple yet effective zeroshot approach that combines structural guidance, provided via line maps and motion flow, with generative synthesis, enabling smooth, motion-consistent transitions without fine-tuning. Extensive experiments and comparison with current alternatives, namely [FILM, TVG, DiffMorpher, VACE, GI], demonstrate that SAGE outperforms both classical and the latest generative baselines on quantitative metrics and user studies for producing transitions between diverse clips. The simple method effectively bypasses the need to acquire suitable training data, which is particularly difficult in our creative setting involving diverse clips. Code is available via the project page at https://kan32501.github.io/sage.github.io/.