EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation

2025-03-24Unverified0· sign in to hype

Qiang Qu, Ming Li, Xiaoming Chen, Tongliang Liu

Unverified — Be the first to reproduce this paper.

Abstract

Conditional human animation traditionally animates static reference images using pose-based motion cues extracted from video data. However, these video-derived cues often suffer from low temporal resolution, motion blur, and unreliable performance under challenging lighting conditions. In contrast, event cameras inherently provide robust and high temporal-resolution motion information, offering resilience to motion blur, low-light environments, and exposure variations. In this paper, we propose EvAnimate, the first method leveraging event streams as robust and precise motion cues for conditional human image animation. Our approach is fully compatible with diffusion-based generative models, enabled by encoding asynchronous event data into a specialized three-channel representation with adaptive slicing rates and densities. High-quality and temporally coherent animations are achieved through a dual-branch architecture explicitly designed to exploit event-driven dynamics, significantly enhancing performance under challenging real-world conditions. Enhanced cross-subject generalization is further achieved using specialized augmentation strategies. To facilitate future research, we establish a new benchmarking, including simulated event data for training and validation, and a real-world event dataset capturing human actions under normal and challenging scenarios. The experiment results demonstrate that EvAnimate achieves high temporal fidelity and robust performance in scenarios where traditional video-derived cues fall short.

Tasks

Benchmarking Data Augmentation Human Animation Image Animation Image to Video Generation Video Generation

EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation

Abstract

Tasks

Reproductions