SuperFlow++: Enhanced Spatiotemporal Consistency for Cross-Modal Data Pretraining Mar 25, 2025 Autonomous Driving Computational Efficiency
Code Code Available 2OpenLex3D: A New Evaluation Benchmark for Open-Vocabulary 3D Scene Representations Mar 25, 2025 3D Semantic Segmentation Scene Understanding
— Unverified 0The Coralscapes Dataset: Semantic Scene Understanding in Coral Reefs Mar 25, 2025 Benchmarking Scene Segmentation
Code Code Available 1Predicting the Road Ahead: A Knowledge Graph based Foundation Model for Scene Understanding in Autonomous Driving Mar 24, 2025 Autonomous Driving Knowledge Graphs
— Unverified 0SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining Mar 23, 2025 3DGS Benchmarking
Code Code Available 3MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation Mar 23, 2025 Language Modeling Language Modelling
— Unverified 0PanoGS: Gaussian-based Panoptic Segmentation for 3D Open Vocabulary Scene Understanding Mar 23, 2025 3DGS Decoder
— Unverified 0PolarFree: Polarization-based Reflection-free Imaging Mar 23, 2025 Reflection Removal Scene Understanding
Code Code Available 2PanopticSplatting: End-to-End Panoptic Gaussian Splatting Mar 23, 2025 global-optimization NeRF
— Unverified 0Geometric Constrained Non-Line-of-Sight Imaging Mar 23, 2025 Scene Understanding Surface Reconstruction
— Unverified 0ClaraVid: A Holistic Scene Reconstruction Benchmark From Aerial Perspective With Delentropy-Based Complexity Profiling Mar 22, 2025 Panoptic Segmentation Scene Understanding
— Unverified 0ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail Mar 21, 2025 Object Scene Understanding
— Unverified 0Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding Mar 20, 2025 Scene Understanding
Code Code Available 1IRef-VLA: A Benchmark for Interactive Referential Grounding with Imperfect Language in 3D Scenes Mar 20, 2025 Scene Understanding Spatial Reasoning
Code Code Available 2From Monocular Vision to Autonomous Action: Guiding Tumor Resection via 3D Reconstruction Mar 20, 2025 3D Reconstruction Anatomy
— Unverified 0SemanticFlow: A Self-Supervised Framework for Joint Scene Flow Prediction and Instance Segmentation in Dynamic Environments Mar 19, 2025 Autonomous Driving Computational Efficiency
— Unverified 0These Magic Moments: Differentiable Uncertainty Quantification of Radiance Field Models Mar 18, 2025 Decision Making Scene Understanding
— Unverified 0PSA-SSL: Pose and Size-aware Self-Supervised Learning on LiDAR Point Clouds Mar 18, 2025 3D Object Detection 3D Semantic Segmentation
Code Code Available 0ChatBEV: A Visual Language Model that Understands BEV Maps Mar 18, 2025 Autonomous Driving Language Modeling
— Unverified 0Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation Mar 17, 2025 Data Interaction Scene Understanding
Code Code Available 2HIS-GPT: Towards 3D Human-In-Scene Multimodal Understanding Mar 17, 2025 Question Answering Scene Understanding
— Unverified 0Learning-based 3D Reconstruction in Autonomous Driving: A Comprehensive Survey Mar 17, 2025 3D Reconstruction Autonomous Driving
— Unverified 0NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models Mar 17, 2025 Question Answering Scene Understanding
Code Code Available 1Logic-RAG: Augmenting Large Multimodal Models with Visual-Spatial Knowledge for Road Scene Understanding Mar 16, 2025 Autonomous Driving RAG
Code Code Available 1EgoSplat: Open-Vocabulary Egocentric Scene Understanding with Language Embedded 3D Gaussian Splatting Mar 14, 2025 Scene Understanding Segmentation
— Unverified 0Road Rage Reasoning with Vision-language Models (VLMs): Task Definition and Evaluation Dataset Mar 14, 2025 Scene Understanding
— Unverified 0TARS: Traffic-Aware Radar Scene Flow Estimation Mar 13, 2025 Autonomous Driving object-detection
— Unverified 0TGP: Two-modal occupancy prediction with 3D Gaussian and sparse points for 3D Environment Awareness Mar 13, 2025 Autonomous Driving Prediction
— Unverified 0Graph-Grounded LLMs: Leveraging Graphical Function Calling to Minimize LLM Hallucinations Mar 13, 2025 Autonomous Vehicles Knowledge Graphs
— Unverified 0Object-Aware DINO (Oh-A-Dino): Enhancing Self-Supervised Representations for Multi-Object Instance Retrieval Mar 12, 2025 Object Retrieval
— Unverified 0DIV-FF: Dynamic Image-Video Feature Fields For Environment Understanding in Egocentric Videos Mar 11, 2025 Scene Understanding
— Unverified 0MaskAttn-UNet: A Mask Attention-Driven Framework for Universal Low-Resolution Image Segmentation Mar 11, 2025 Image Segmentation Panoptic Segmentation
— Unverified 0TrackOcc: Camera-based 4D Panoptic Occupancy Tracking Mar 11, 2025 3D Object Tracking Object Tracking
Code Code Available 2General-Purpose Aerial Intelligent Agents Empowered by Large Language Models Mar 11, 2025 Motion Planning Scene Understanding
— Unverified 0Generating Robot Constitutions & Benchmarks for Semantic Safety Mar 11, 2025 Collision Avoidance Image Generation
— Unverified 0LLaFEA: Frame-Event Complementary Fusion for Fine-Grained Spatiotemporal Understanding in LMMs Mar 10, 2025 Position Scene Understanding
— Unverified 0Chameleon: Fast-slow Neuro-symbolic Lane Topology Extraction Mar 10, 2025 Autonomous Driving Scene Understanding
Code Code Available 2A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning Mar 10, 2025 Object Scene Understanding
Code Code Available 1CoT-Drive: Efficient Motion Forecasting for Autonomous Driving with LLMs and Chain-of-Thought Prompting Mar 10, 2025 Autonomous Driving Knowledge Distillation
— Unverified 0Segment Anything, Even Occluded Mar 8, 2025 Amodal Instance Segmentation Autonomous Driving
— Unverified 0Feature-EndoGaussian: Feature Distilled Gaussian Splatting in Surgical Deformable Scene Reconstruction Mar 8, 2025 3DGS image-classification
— Unverified 0SplatTalk: 3D VQA with Gaussian Splatting Mar 8, 2025 3DGS Question Answering
— Unverified 0VLScene: Vision-Language Guidance Distillation for Camera-Based 3D Semantic Scene Completion Mar 8, 2025 3D Semantic Scene Completion Autonomous Driving
Code Code Available 1Towards Ambiguity-Free Spatial Foundation Model: Rethinking and Decoupling Depth Ambiguity Mar 8, 2025 Depth Estimation Scene Understanding
Code Code Available 0An Egocentric Vision-Language Model based Portable Real-time Smart Assistant Mar 6, 2025 Language Modeling Language Modelling
Code Code Available 2EvidMTL: Evidential Multi-Task Learning for Uncertainty-Aware Semantic Surface Mapping from Monocular RGB Images Mar 6, 2025 Depth Estimation Depth Prediction
— Unverified 0Vision-Language Models Struggle to Align Entities across Modalities Mar 5, 2025 Attribute Code Generation
— Unverified 0SurgiSAM2: Fine-tuning a foundational model for surgical video anatomy segmentation and detection Mar 5, 2025 Anatomy Scene Segmentation
— Unverified 0Improving 6D Object Pose Estimation of metallic Household and Industry Objects Mar 5, 2025 6D Pose Estimation using RGB Pose Estimation
— Unverified 0SSNet: Saliency Prior and State Space Model-based Network for Salient Object Detection in RGB-D Images Mar 4, 2025 object-detection Object Detection
— Unverified 0