T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations

2023-01-15Code Available2· sign in to hype

Jianrong Zhang, Yangsong Zhang, Xiaodong Cun, Shaoli Huang, Yong Zhang, Hongwei Zhao, Hongtao Lu, Xi Shen

Code Available — Be the first to reproduce this paper.

Code

github.com/Mael-zys/T2M-GPT
Officialpytorch★ 754

Abstract

In this work, we investigate a simple and must-known conditional generative framework based on Vector Quantised-Variational AutoEncoder (VQ-VAE) and Generative Pre-trained Transformer (GPT) for human motion generation from textural descriptions. We show that a simple CNN-based VQ-VAE with commonly used training recipes (EMA and Code Reset) allows us to obtain high-quality discrete representations. For GPT, we incorporate a simple corruption strategy during the training to alleviate training-testing discrepancy. Despite its simplicity, our T2M-GPT shows better performance than competitive approaches, including recent diffusion-based approaches. For example, on HumanML3D, which is currently the largest dataset, we achieve comparable performance on the consistency between text and generated motion (R-Precision), but with FID 0.116 largely outperforming MotionDiffuse of 0.630. Additionally, we conduct analyses on HumanML3D and observe that the dataset size is a limitation of our approach. Our work suggests that VQ-VAE still remains a competitive approach for human motion generation.

Tasks

Motion Generation Motion Synthesis

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
HumanML3D	T2M-GPT (τ = 0)	FID	0.14	—	Unverified
HumanML3D	T2M-GPT (τ ∈ U[0, 1])	FID	0.14	—	Unverified
HumanML3D	T2M-GPT (τ = 0.5)	FID	0.12	—	Unverified
KIT Motion-Language	T2M-GPT (τ = 0)	FID	0.74	—	Unverified
KIT Motion-Language	T2M-GPT (τ ∈ U[0, 1])	FID	0.51	—	Unverified
KIT Motion-Language	T2M-GPT (τ = 0.5)	FID	0.72	—	Unverified
Motion-X	T2M-GPT	FID	1.37	—	Unverified

T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations

Code

Abstract

Tasks

Benchmark Results

Reproductions