SOTAVerified

A Unified Transformer-Based Framework with Pretraining For Whole Body Grasping Motion Generation

2025-07-01Code Available0· sign in to hype

Edward Effendy, Kuan-Wei Tseng, Rei Kawakami

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Accepted in the ICIP 2025 We present a novel transformer-based framework for whole-body grasping that addresses both pose generation and motion infilling, enabling realistic and stable object interactions. Our pipeline comprises three stages: Grasp Pose Generation for full-body grasp generation, Temporal Infilling for smooth motion continuity, and a LiftUp Transformer that refines downsampled joints back to high-resolution markers. To overcome the scarcity of hand-object interaction data, we introduce a data-efficient Generalized Pretraining stage on large, diverse motion datasets, yielding robust spatio-temporal representations transferable to grasping tasks. Experiments on the GRAB dataset show that our method outperforms state-of-the-art baselines in terms of coherence, stability, and visual realism. The modular design also supports easy adaptation to other human-motion applications.

Tasks

Reproductions