SOTAVerified

MUVO: A Multimodal World Model with Spatial Representations for Autonomous Driving

2023-11-20Code Available1· sign in to hype

Daniel Bogdoll, Yitian Yang, Tim Joseph, J. Marius Zöllner

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Learning unsupervised world models for autonomous driving has the potential to improve the reasoning capabilities of today's systems dramatically. However, most work neglects the physical attributes of the world and focuses on sensor data alone. We propose MUVO, a MUltimodal World Model with spatial VOxel representations, to address this challenge. We utilize raw camera and lidar data to learn a sensor-agnostic geometric representation of the world. We demonstrate multimodal future predictions and show that our spatial representation improves the prediction quality of both camera images and lidar point clouds.

Tasks

Reproductions