General Geometry-aware Weakly Supervised 3D Object Detection Jul 18, 2024 3D Object Detection Object
Code Code Available 1Generating Visual Spatial Description via Holistic 3D Scene Understanding May 19, 2023 Scene Understanding Text Generation
Code Code Available 1Towards Holistic Surgical Scene Understanding Dec 8, 2022 Action Recognition Atomic action recognition
Code Code Available 1Towards In-context Scene Understanding Jun 2, 2023 Depth Estimation In-Context Learning
Code Code Available 1Efficient Multi-Task RGB-D Scene Analysis for Indoor Environments Jul 10, 2022 Instance Segmentation Panoptic Segmentation
Code Code Available 1CAKES: Channel-wise Automatic KErnel Shrinking for Efficient 3D Networks Mar 28, 2020 3D Medical Imaging Segmentation Action Recognition
Code Code Available 1Global Aggregation then Local Distribution in Fully Convolutional Networks Sep 16, 2019 Instance Segmentation object-detection
Code Code Available 1Towards Scene Understanding for Autonomous Operations on Airport Aprons Dec 4, 2022 Autonomous Driving Benchmarking
Code Code Available 1ARKitScenes: A Diverse Real-World Dataset For 3D Indoor Scene Understanding Using Mobile RGB-D Data Nov 17, 2021 3D Object Detection object-detection
Code Code Available 13UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding Jan 14, 2025 Language Modeling Language Modelling
Code Code Available 1CamContextI2V: Context-aware Controllable Video Generation Apr 8, 2025 Diversity Scene Understanding
Code Code Available 1Traffic Scene Parsing through the TSP6K Dataset Mar 6, 2023 Autonomous Driving Decoder
Code Code Available 1Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation Dec 24, 2021 Depth Estimation Depth Prediction
Code Code Available 1Explainable Object-induced Action Decision for Autonomous Vehicles Mar 20, 2020 Autonomous Driving Autonomous Vehicles
Code Code Available 1Transformers in Self-Supervised Monocular Depth Estimation with Unknown Camera Intrinsics Feb 7, 2022 Autonomous Driving Depth Estimation
Code Code Available 1Global-Reasoned Multi-Task Learning Model for Surgical Scene Understanding Jan 28, 2022 Graph Attention Knowledge Distillation
Code Code Available 1TSP-Transformer: Task-Specific Prompts Boosted Transformer for Holistic Scene Understanding Nov 6, 2023 Boundary Detection Depth Estimation
Code Code Available 1Uncertainty-aware Panoptic Segmentation Jun 29, 2022 Panoptic Segmentation Scene Understanding
Code Code Available 1Campus3D: A Photogrammetry Point Cloud Benchmark for Hierarchical Understanding of Outdoor Scene Aug 11, 2020 Instance Segmentation Point Cloud Segmentation
Code Code Available 1Understanding Bird's-Eye View of Road Semantics using an Onboard Camera Dec 5, 2020 Autonomous Navigation Autonomous Vehicles
Code Code Available 1Holistic 3D Scene Understanding from a Single Image with Implicit Representation Mar 11, 2021 3D Object Detection 3D Shape Reconstruction
Code Code Available 1Knowledge Distillation from 3D to Bird's-Eye-View for LiDAR Semantic Segmentation Apr 22, 2023 Autonomous Driving Knowledge Distillation
Code Code Available 1UniM-OV3D: Uni-Modality Open-Vocabulary 3D Scene Understanding with Fine-Grained Feature Representation Jan 21, 2024 Instance Segmentation Scene Understanding
Code Code Available 1Unleash the Potential of Image Branch for Cross-modal 3D Object Detection Jan 22, 2023 3D Object Detection Autonomous Vehicles
Code Code Available 1EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery Jan 20, 2025 Language Modeling Language Modelling
Code Code Available 1UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios Aug 30, 2024 Attribute geo-localization
Code Code Available 1MonteBoxFinder: Detecting and Filtering Primitives to Fit a Noisy Point Cloud Jul 28, 2022 Scene Understanding
Code Code Available 1VideoNavQA: Bridging the Gap between Visual and Embodied Question Answering Aug 14, 2019 Embodied Question Answering Question Answering
Code Code Available 1PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape Reconstruction Apr 16, 2024 3D Reconstruction 3D Shape Reconstruction
Code Code Available 1Challenges for Monocular 6D Object Pose Estimation in Robotics Jul 22, 2023 6D Pose Estimation using RGB Object
— Unverified 0ArK: Augmented Reality with Knowledge Interactive Emergent Ability May 1, 2023 AI Agent Mixed Reality
— Unverified 0Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models Jul 17, 2025 3D Point Cloud Reconstruction Point cloud reconstruction
— Unverified 0Adversarial Attacks on Monocular Depth Estimation Mar 23, 2020 Autonomous Driving Depth Estimation
— Unverified 0Advancing the Understanding of Fine-Grained 3D Forest Structures using Digital Cousins and Simulation-to-Reality: Methods and Datasets Jan 7, 2025 Data Augmentation parameter estimation
— Unverified 03D Vision-Language Gaussian Splatting Oct 10, 2024 3D Reconstruction Autonomous Driving
— Unverified 0Category-Level and Open-Set Object Pose Estimation for Robotics Apr 28, 2025 6D Pose Estimation 6D Pose Estimation using RGB
— Unverified 0Evaluation of Multimodal Semantic Segmentation using RGB-D Data Mar 31, 2021 Scene Understanding Semantic Segmentation
— Unverified 0Catch Me if You Can: A Novel Task for Detection of Covert Geo-Locations (CGL) Feb 5, 2022 object-detection Object Detection
— Unverified 0A Review on Visual-SLAM: Advancements from Geometric Modelling to Learning-based Semantic Scene Understanding Sep 12, 2022 Scene Understanding
— Unverified 0GaussianBeV: 3D Gaussian Representation meets Perception Models for BeV Segmentation Jul 19, 2024 BEV Segmentation Scene Understanding
— Unverified 0Evaluating the Impact of Point Cloud Colorization on Semantic Segmentation Accuracy Oct 9, 2024 Colorization Point Cloud Segmentation
— Unverified 0Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users Mar 28, 2025 Object Recognition Reading Comprehension
— Unverified 0Cataract-1K: Cataract Surgery Dataset for Scene Segmentation, Phase Recognition, and Irregularity Detection Dec 11, 2023 Benchmarking Domain Adaptation
— Unverified 0CASPNet++: Joint Multi-Agent Motion Prediction Aug 15, 2023 Autonomous Driving motion prediction
— Unverified 0GameVLM: A Decision-making Framework for Robotic Task Planning Based on Visual Language Models and Zero-sum Games May 22, 2024 Code Generation Decision Making
— Unverified 0Estimating Depth from Monocular Images as Classification Using Deep Fully Convolutional Residual Networks May 8, 2016 Depth Estimation General Classification
— Unverified 0Case-based Reasoning Augmented Large Language Model Framework for Decision Making in Realistic Safety-Critical Driving Scenarios Jun 25, 2025 Autonomous Driving Decision Making
— Unverified 0Event fields: Capturing light fields at high speed, resolution, and dynamic range Dec 9, 2024 Depth Estimation Scene Understanding
— Unverified 0Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond Mar 3, 2025 Infrared And Visible Image Fusion Scene Understanding
— Unverified 0ESGNN: Towards Equivariant Scene Graph Neural Network for 3D Scene Understanding Jun 30, 2024 Graph Generation Graph Neural Network
— Unverified 0