A2-FPN for Semantic Segmentation of Fine-Resolution Remotely Sensed Images Feb 16, 2021 Decision Making Scene Understanding
Code Code Available 15 M3D-RPN: Monocular 3D Region Proposal Network for Object Detection Jul 13, 2019 3D Object Detection 3D Object Detection From Monocular Images
Code Code Available 15 MassMIND: Massachusetts Maritime INfrared Dataset Sep 9, 2022 Instance Segmentation Scene Understanding
Code Code Available 15 Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding Apr 9, 2025 Scene Understanding Self-Supervised Learning
Code Code Available 15 Panoramic Panoptic Segmentation: Insights Into Surrounding Parsing for Mobile Agents via Unsupervised Contrastive Learning Jun 21, 2022 Contrastive Learning Domain Generalization
Code Code Available 15 PC-BEV: An Efficient Polar-Cartesian BEV Fusion Framework for LiDAR Semantic Segmentation Dec 19, 2024 LIDAR Semantic Segmentation Scene Understanding
Code Code Available 15 PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection Mar 14, 2023 3D Object Detection Decoder
Code Code Available 15 Class-Incremental Domain Adaptation with Smoothing and Calibration for Surgical Report Generation Jul 23, 2021 Domain Adaptation Few-Shot Learning
Code Code Available 15 STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous Driving Jun 6, 2025 Autonomous Driving Autonomous Vehicles
Code Code Available 15 Distilled Semantics for Comprehensive Scene Understanding from Videos Mar 31, 2020 Depth Estimation Knowledge Distillation
Code Code Available 15 Event-aided Semantic Scene Completion Feb 4, 2025 Autonomous Driving Scene Understanding
Code Code Available 15 Microsoft COCO: Common Objects in Context May 1, 2014 Instance Segmentation Object
Code Code Available 15 Estimating and Exploiting the Aleatoric Uncertainty in Surface Normal Estimation Sep 20, 2021 Decoder Prediction
Code Code Available 15 SurgTPGS: Semantic 3D Surgical Scene Understanding with Text Promptable Gaussian Splatting Jun 29, 2025 3D Reconstruction Scene Understanding
Code Code Available 15 Estimating Generic 3D Room Structures from 2D Annotations Jun 15, 2023 Scene Understanding
Code Code Available 15 Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation Model Mar 30, 2025 Depth Estimation Monocular Depth Estimation
Code Code Available 15 Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding Nov 29, 2024 3D geometry 3DGS
Code Code Available 15 Event-based Motion Segmentation with Spatio-Temporal Graph Cuts Dec 16, 2020 Motion Segmentation Scene Understanding
Code Code Available 15 PanopticNDT: Efficient and Robust Panoptic Mapping Sep 24, 2023 2D Panoptic Segmentation 3D Panoptic Segmentation
Code Code Available 15 A Versatile and Efficient Reinforcement Learning Framework for Autonomous Driving Oct 22, 2021 Autonomous Driving reinforcement-learning
Code Code Available 15 EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery Jan 20, 2025 Language Modeling Language Modelling
Code Code Available 15 0-MMS: Zero-Shot Multi-Motion Segmentation With A Monocular Event Camera Jun 11, 2020 Motion Compensation Motion Segmentation
Code Code Available 15 A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning Mar 10, 2025 Object Scene Understanding
Code Code Available 15 DPF: Learning Dense Prediction Fields with Weak Supervision Mar 29, 2023 Intrinsic Image Decomposition Prediction
Code Code Available 15 Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge Nov 21, 2023 Large Language Model Multimodal Deep Learning
Code Code Available 15 MonoDistill: Learning Spatial Features for Monocular 3D Object Detection Jan 26, 2022 3D Object Detection Monocular 3D Object Detection
Code Code Available 15 PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation Jun 16, 2023 3D Panoptic Segmentation Autonomous Driving
Code Code Available 15 MSeg: A Composite Dataset for Multi-domain Semantic Segmentation Dec 27, 2021 Computational Efficiency Instance Segmentation
Code Code Available 15 Explainable Object-induced Action Decision for Autonomous Vehicles Mar 20, 2020 Autonomous Driving Autonomous Vehicles
Code Code Available 15 TextSLAM: Visual SLAM with Planar Text Features Nov 26, 2019 Object SLAM Scene Understanding
Code Code Available 15 OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge May 31, 2019 object-detection Object Detection
Code Code Available 15 MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering Mar 17, 2022 Implicit Relations Question Answering
Code Code Available 15 Multi3DRefer: Grounding Text Description to Multiple 3D Objects Sep 11, 2023 3D visual grounding Contrastive Learning
Code Code Available 15 DTCLMapper: Dual Temporal Consistent Learning for Vectorized HD Map Construction May 9, 2024 Contrastive Learning Scene Understanding
Code Code Available 15 Panoptic 3D Scene Reconstruction From a Single RGB Image Nov 3, 2021 2D Panoptic Segmentation 3D Instance Segmentation
Code Code Available 15 Dual-Hybrid Attention Network for Specular Highlight Removal Jul 17, 2024 highlight removal Object Recognition
Code Code Available 15 Multimodal Dataset for Localization, Mapping and Crop Monitoring in Citrus Tree Farms Sep 27, 2023 object-detection Object Detection
Code Code Available 15 Egocentric Scene Understanding via Multimodal Spatial Rectifier Jul 14, 2022 Scene Understanding Surface Normal Estimation
Code Code Available 15 Cityscapes-Panoptic-Parts and PASCAL-Panoptic-Parts datasets for Scene Understanding Apr 16, 2020 Human Part Segmentation Panoptic Segmentation
Code Code Available 15 Efficient Multi-Task RGB-D Scene Analysis for Indoor Environments Jul 10, 2022 Instance Segmentation Panoptic Segmentation
Code Code Available 15 Dynamic Graph Message Passing Networks for Visual Recognition Sep 20, 2022 image-classification Image Classification
Code Code Available 15 Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models May 15, 2023 3D Object Detection Image Captioning
Code Code Available 15 Exploiting Edge-Oriented Reasoning for 3D Point-based Scene Graph Analysis Mar 9, 2021 3d scene graph generation graph construction
Code Code Available 15 Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering Jul 30, 2024 Inverse Rendering NeRF
Code Code Available 15 Multi-Scale Attention for Audio Question Answering May 29, 2023 Audio Question Answering Question Answering
Code Code Available 15 Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action and Gesture Recognition Aug 23, 2023 Gesture Recognition Scene Understanding
Code Code Available 15 P2T: Pyramid Pooling Transformer for Scene Understanding Jun 22, 2021 image-classification Image Classification
Code Code Available 15 ARKitScenes: A Diverse Real-World Dataset For 3D Indoor Scene Understanding Using Mobile RGB-D Data Nov 17, 2021 3D Object Detection object-detection
Code Code Available 15 ECLAIR: A High-Fidelity Aerial LiDAR Dataset for Semantic Segmentation Apr 16, 2024 3D Semantic Segmentation Management
Code Code Available 15 3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding Jan 14, 2025 Language Modeling Language Modelling
Code Code Available 15