UDE: A Unified Driving Engine for Human Motion Generation
Zixiang Zhou, Baoyuan Wang
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/zixiangzhou916/udeOfficialIn paperpytorch★ 58
Abstract
Generating controllable and editable human motion sequences is a key challenge in 3D Avatar generation. It has been labor-intensive to generate and animate human motion for a long time until learning-based approaches have been developed and applied recently. However, these approaches are still task-specific or modality-specific ahuja2019language2poseghosh2021synthesisferreira2021learningli2021ai. In this paper, we propose ``UDE", the first unified driving engine that enables generating human motion sequences from natural language or audio sequences (see Fig.~fig:teaser). Specifically, UDE consists of the following key components: 1) a motion quantization module based on VQVAE that represents continuous motion sequence as discrete latent codevan2017neural, 2) a modality-agnostic transformer encodervaswani2017attention that learns to map modality-aware driving signals to a joint space, and 3) a unified token transformer (GPT-likeradford2019language) network to predict the quantized latent code index in an auto-regressive manner. 4) a diffusion motion decoder that takes as input the motion tokens and decodes them into motion sequences with high diversity. We evaluate our method on HumanML3DGuo_2022_CVPR and AIST++li2021learn benchmarks, and the experiment results demonstrate our method achieves state-of-the-art performance. Project website: https://github.com/zixiangzhou916/UDE/
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| AIST++ | UDE | FID | 17.25 | — | Unverified |