Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model Dec 6, 2024 Autonomous Driving Autonomous Vehicles
Code Code Available 2EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding Dec 5, 2024 Prediction Scene Understanding
Code Code Available 2LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences Dec 2, 2024 Embodied Question Answering Question Answering
Code Code Available 2GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks Nov 28, 2024 Benchmarking Object Counting
Code Code Available 2An End-to-End Robust Point Cloud Semantic Segmentation Network with Single-Step Conditional Diffusion Models Nov 25, 2024 Denoising Scene Understanding
Code Code Available 2GaussianPretrain: A Simple Unified 3D Gaussian Representation for Visual Pre-training in Autonomous Driving Nov 19, 2024 3D Object Detection Autonomous Driving
Code Code Available 2OSMLoc: Single Image-Based Visual Localization in OpenStreetMap with Fused Geometric and Semantic Guidance Nov 13, 2024 Depth Estimation Monocular Depth Estimation
Code Code Available 2On Deep Learning for Geometric and Semantic Scene Understanding Using On-Vehicle 3D LiDAR Nov 1, 2024 3D Semantic Segmentation Autonomous Driving
Code Code Available 2VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding Oct 17, 2024 3D geometry 3D visual grounding
Code Code Available 2ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding Oct 17, 2024 3D Semantic Segmentation Image Generation
Code Code Available 2Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation Sep 30, 2024 Cross-Modal Retrieval Dynamic Time Warping
Code Code Available 2Learning Multiple Probabilistic Decisions from Latent World Model in Autonomous Driving Sep 24, 2024 Autonomous Driving Imitation Learning
Code Code Available 2Hier-SLAM: Scaling-up Semantics in SLAM with a Hierarchically Categorical Gaussian Splatting Sep 19, 2024 Scene Understanding Semantic Segmentation
Code Code Available 2PrimeDepth: Efficient Monocular Depth Estimation with a Stable Diffusion Preimage Sep 13, 2024 Depth Estimation Monocular Depth Estimation
Code Code Available 2Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding Sep 5, 2024 Question Answering Scene Understanding
Code Code Available 2Unveiling Deep Shadows: A Survey and Benchmark on Image and Video Shadow Detection, Removal, and Generation in the Deep Learning Era Sep 3, 2024 Scene Understanding Shadow Detection
Code Code Available 2RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments Aug 28, 2024 Autonomous Driving Autonomous Navigation
Code Code Available 2A Unified Framework for 3D Scene Understanding Jul 3, 2024 Contrastive Learning Knowledge Distillation
Code Code Available 2StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images Jun 19, 2024 Object Recognition Scene Understanding
Code Code Available 2RS-Agent: Automating Remote Sensing Tasks through Intelligent Agent Jun 11, 2024 AI Agent Descriptive
Code Code Available 2MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering May 20, 2024 Benchmarking Question Answering
Code Code Available 2Grounded 3D-LLM with Referent Tokens May 16, 2024 Dense Captioning Diversity
Code Code Available 2OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies May 8, 2024 Domain Adaptation Scene Understanding
Code Code Available 2SPIdepth: Strengthened Pose Information for Self-supervised Monocular Depth Estimation Apr 18, 2024 Autonomous Driving Depth Estimation
Code Code Available 2NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields Apr 1, 2024 3D Object Detection NeRF
Code Code Available 2Is Your LiDAR Placement Optimized for 3D Scene Understanding? Mar 25, 2024 3D Object Detection LIDAR Semantic Segmentation
Code Code Available 2Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding Mar 25, 2024 Data Augmentation Scene Understanding
Code Code Available 2Volumetric Environment Representation for Vision-Language Navigation Mar 21, 2024 3D geometry Multi-Task Learning
Code Code Available 2FusionVision: A comprehensive approach of 3D object reconstruction and segmentation from RGB-D cameras using YOLO and fast segment anything Feb 29, 2024 3D Object Reconstruction Instance Segmentation
Code Code Available 2Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives Feb 5, 2024 Continual Learning Multi-Task Learning
Code Code Available 2Pixel-Wise Recognition for Holistic Surgical Scene Understanding Jan 20, 2024 Scene Understanding Segmentation
Code Code Available 2MAPLM: A Real-World Large-Scale Vision-Language Benchmark for Map and Traffic Scene Understanding Jan 1, 2024 Autonomous Driving Instruction Following
Code Code Available 2EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI Dec 26, 2023 Scene Understanding
Code Code Available 2Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers Dec 13, 2023 3D Question Answering (3D-QA) Attribute
Code Code Available 2Gaussian Grouping: Segment and Edit Anything in 3D Scenes Dec 1, 2023 Colorization NeRF
Code Code Available 2SpectralGPT: Spectral Remote Sensing Foundation Model Nov 13, 2023 Change Detection model
Code Code Available 2On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving Nov 9, 2023 Autonomous Driving Common Sense Reasoning
Code Code Available 2OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation Sep 1, 2023 3D Open-Vocabulary Instance Segmentation 3D Open-Vocabulary Object Detection
Code Code Available 2ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes Aug 22, 2023 3D Semantic Segmentation Novel View Synthesis
Code Code Available 2Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes Aug 17, 2023 Language Modeling Language Modelling
Code Code Available 2TaskExpert: Dynamically Assembling Multi-Task Representations with Memorial Mixture-of-Experts Jul 28, 2023 Long-range modeling Mixture-of-Experts
Code Code Available 2A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future Jul 18, 2023 Knowledge Distillation object-detection
Code Code Available 2Towards Open Vocabulary Learning: A Survey Jun 28, 2023 Open Set Learning Out-of-Distribution Detection
Code Code Available 2OpenMask3D: Open-Vocabulary 3D Instance Segmentation Jun 23, 2023 3D Instance Segmentation 3D Open-Vocabulary Instance Segmentation
Code Code Available 2InvPT++: Inverted Pyramid Multi-Task Transformer for Visual Scene Understanding Jun 8, 2023 Decoder Multi-Task Learning
Code Code Available 2TextSLAM: Visual SLAM with Semantic Planar Text Features May 17, 2023 Mixed Reality Object SLAM
Code Code Available 2TaskPrompter: Spatial-Channel Multi-Task Prompting for Dense Scene Understanding May 1, 2023 3D Object Detection Monocular Depth Estimation
Code Code Available 2Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding Apr 14, 2023 3D Object Detection Scene Understanding
Code Code Available 2Graph-based Topology Reasoning for Driving Scenes Apr 11, 2023 3D Lane Detection Autonomous Driving
Code Code Available 2RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding Apr 3, 2023 Contrastive Learning Instance Segmentation
Code Code Available 2