MUVO: A Multimodal World Model with Spatial Representations for Autonomous Driving
2023-11-20Code Available1· sign in to hype
Daniel Bogdoll, Yitian Yang, Tim Joseph, J. Marius Zöllner
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/fzi-forschungszentrum-informatik/muvoOfficialIn paperpytorch★ 13
Abstract
Learning unsupervised world models for autonomous driving has the potential to improve the reasoning capabilities of today's systems dramatically. However, most work neglects the physical attributes of the world and focuses on sensor data alone. We propose MUVO, a MUltimodal World Model with spatial VOxel representations, to address this challenge. We utilize raw camera and lidar data to learn a sensor-agnostic geometric representation of the world. We demonstrate multimodal future predictions and show that our spatial representation improves the prediction quality of both camera images and lidar point clouds.