SOTAVerified

Open-Vocabulary Online Semantic Mapping for SLAM

2024-11-22Code Available2· sign in to hype

Tomas Berriel Martins, Martin R. Oswald, Javier Civera

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

This paper presents an Open-Vocabulary Online 3D semantic mapping pipeline, that we denote by its acronym OVO. Given a sequence of posed RGB-D frames, we detect and track 3D segments, which we describe using CLIP vectors. These are computed from the viewpoints where they are observed by a novel CLIP merging method. Notably, our OVO has a significantly lower computational and memory footprint than offline baselines, while also showing better segmentation metrics than them. Along with superior segmentation performance, we also show experimental results of our mapping contributions integrated with two different SLAM backbones (Gaussian-SLAM and ORB-SLAM2), being the first ones demonstrating end-to-end open-vocabulary online 3D reconstructions without relying on ground-truth camera poses or scene geometry.

Tasks

Reproductions