SOTAVerified

UDE: A Unified Driving Engine for Human Motion Generation

2022-11-29CVPR 2023Code Available1· sign in to hype

Zixiang Zhou, Baoyuan Wang

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Generating controllable and editable human motion sequences is a key challenge in 3D Avatar generation. It has been labor-intensive to generate and animate human motion for a long time until learning-based approaches have been developed and applied recently. However, these approaches are still task-specific or modality-specific ahuja2019language2poseghosh2021synthesisferreira2021learningli2021ai. In this paper, we propose ``UDE", the first unified driving engine that enables generating human motion sequences from natural language or audio sequences (see Fig.~fig:teaser). Specifically, UDE consists of the following key components: 1) a motion quantization module based on VQVAE that represents continuous motion sequence as discrete latent codevan2017neural, 2) a modality-agnostic transformer encodervaswani2017attention that learns to map modality-aware driving signals to a joint space, and 3) a unified token transformer (GPT-likeradford2019language) network to predict the quantized latent code index in an auto-regressive manner. 4) a diffusion motion decoder that takes as input the motion tokens and decodes them into motion sequences with high diversity. We evaluate our method on HumanML3DGuo_2022_CVPR and AIST++li2021learn benchmarks, and the experiment results demonstrate our method achieves state-of-the-art performance. Project website: https://github.com/zixiangzhou916/UDE/

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
AIST++UDEFID17.25Unverified

Reproductions