A Framework for Multisensory Foresight for Embodied Agents

2021-09-15Code Available0· sign in to hype

Xiaohui Chen, Ramtin Hosseini, Karen Panetta, Jivko Sinapov

Code Available — Be the first to reproduce this paper.

Code

github.com/tufts-ai-robotics-group/mmvp
OfficialIn paperpytorch★ 0

Abstract

Predicting future sensory states is crucial for learning agents such as robots, drones, and autonomous vehicles. In this paper, we couple multiple sensory modalities with exploratory actions and propose a predictive neural network architecture to address this problem. Most existing approaches rely on large, manually annotated datasets, or only use visual data as a single modality. In contrast, the unsupervised method presented here uses multi-modal perceptions for predicting future visual frames. As a result, the proposed model is more comprehensive and can better capture the spatio-temporal dynamics of the environment, leading to more accurate visual frame prediction. The other novelty of our framework is the use of sub-networks dedicated to anticipating future haptic, audio, and tactile signals. The framework was tested and validated with a dataset containing 4 sensory modalities (vision, haptic, audio, and tactile) on a humanoid robot performing 9 behaviors multiple times on a large set of objects. While the visual information is the dominant modality, utilizing the additional non-visual modalities improves the accuracy of predictions.

Tasks

Autonomous Vehicles

A Framework for Multisensory Foresight for Embodied Agents

Code

Abstract

Tasks

Reproductions