Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks

2018-10-24Code Available1· sign in to hype

Michelle A. Lee, Yuke Zhu, Krishnan Srinivasan, Parth Shah, Silvio Savarese, Li Fei-Fei, Animesh Garg, Jeannette Bohg

Code Available — Be the first to reproduce this paper.

Code

github.com/stanford-iprl-lab/multimodal_representation
Officialpytorch★ 0
github.com/Henry1iu/ierg5350_rl_course_project
pytorch★ 31

Abstract

Contact-rich manipulation tasks in unstructured environments often require both haptic and visual feedback. However, it is non-trivial to manually design a robot controller that combines modalities with very different characteristics. While deep reinforcement learning has shown success in learning control policies for high-dimensional inputs, these algorithms are generally intractable to deploy on real robots due to sample complexity. We use self-supervision to learn a compact and multimodal representation of our sensory inputs, which can then be used to improve the sample efficiency of our policy learning. We evaluate our method on a peg insertion task, generalizing over different geometry, configurations, and clearances, while being robust to external perturbations. Results for simulated and real robot experiments are presented.

Tasks

Contact-rich Manipulation Deep Reinforcement Learning Reinforcement Learning Reinforcement Learning (RL)Self-Supervised Learning

Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks

Code

Abstract

Tasks

Reproductions