On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving Nov 9, 2023 Autonomous Driving Common Sense Reasoning
Code Code Available 2TrackOcc: Camera-based 4D Panoptic Occupancy Tracking Mar 11, 2025 3D Object Tracking Object Tracking
Code Code Available 2InvPT: Inverted Pyramid Multi-task Transformer for Dense Scene Understanding Mar 15, 2022 Boundary Detection Human Parsing
Code Code Available 2AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving Dec 19, 2024 Autonomous Driving Benchmarking
Code Code Available 2Hier-SLAM: Scaling-up Semantics in SLAM with a Hierarchically Categorical Gaussian Splatting Sep 19, 2024 Scene Understanding Semantic Segmentation
Code Code Available 2Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning Dec 16, 2024 Hallucination Robot Manipulation
Code Code Available 2InvPT++: Inverted Pyramid Multi-Task Transformer for Visual Scene Understanding Jun 8, 2023 Decoder Multi-Task Learning
Code Code Available 2HAKE: A Knowledge Engine Foundation for Human Activity Understanding Feb 14, 2022 Action Recognition Human-Object Interaction Detection
Code Code Available 2Chameleon: Fast-slow Neuro-symbolic Lane Topology Extraction Mar 10, 2025 Autonomous Driving Scene Understanding
Code Code Available 2CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition Mar 20, 2023 Retrieval Scene Understanding
Code Code Available 2ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding Oct 17, 2024 3D Semantic Segmentation Image Generation
Code Code Available 2Grounded 3D-LLM with Referent Tokens May 16, 2024 Dense Captioning Diversity
Code Code Available 2IRef-VLA: A Benchmark for Interactive Referential Grounding with Imperfect Language in 3D Scenes Mar 20, 2025 Scene Understanding Spatial Reasoning
Code Code Available 2RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding Apr 3, 2023 Contrastive Learning Instance Segmentation
Code Code Available 2Tackling View-Dependent Semantics in 3D Language Gaussian Splatting May 30, 2025 3D Scene Reconstruction Scene Understanding
Code Code Available 2Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning Mar 1, 2025 Scene Understanding
Code Code Available 2Generating Visual Spatial Description via Holistic 3D Scene Understanding May 19, 2023 Scene Understanding Text Generation
Code Code Available 1General Geometry-aware Weakly Supervised 3D Object Detection Jul 18, 2024 3D Object Detection Object
Code Code Available 1GFF: Gated Fully Fusion for Semantic Segmentation Apr 3, 2019 Scene Parsing Scene Understanding
Code Code Available 13DMIT: 3D Multi-modal Instruction Tuning for Scene Understanding Jan 6, 2024 Scene Understanding Visual Question Answering (VQA)
Code Code Available 1A Review of Panoptic Segmentation for Mobile Mapping Point Clouds Apr 27, 2023 Instance Segmentation Panoptic Segmentation
Code Code Available 1Advances in Deep Concealed Scene Understanding Apr 21, 2023 Scene Understanding Semantic Segmentation
Code Code Available 1F-ViTA: Foundation Model Guided Visible to Thermal Translation Apr 3, 2025 Scene Understanding Style Transfer
Code Code Available 1Global Aggregation then Local Distribution in Fully Convolutional Networks Sep 16, 2019 Instance Segmentation object-detection
Code Code Available 1FPS-Net: A Convolutional Fusion Network for Large-Scale LiDAR Point Cloud Segmentation Mar 1, 2021 3D Semantic Segmentation Decoder
Code Code Available 1FocusFlow: Boosting Key-Points Optical Flow Estimation for Autonomous Driving Aug 14, 2023 Autonomous Driving Optical Flow Estimation
Code Code Available 1FreDSNet: Joint Monocular Depth and Semantic Segmentation with Fast Fourier Convolutions Oct 4, 2022 Depth Estimation Monocular Depth Estimation
Code Code Available 1Few-Shot Object Detection and Viewpoint Estimation for Objects in the Wild Jul 23, 2020 Few-Shot Object Detection Meta-Learning
Code Code Available 1Arabic Scene Text Recognition in the Deep Learning Era: Analysis on A Novel Dataset Jul 27, 2021 Scene Text Recognition Scene Understanding
Code Code Available 1FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene Understanding Dec 5, 2020 image-classification Image Classification
Code Code Available 1From General to Specific: Informative Scene Graph Generation via Balance Adjustment Aug 30, 2021 Blocking Graph Generation
Code Code Available 1Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving May 13, 2025 3D visual grounding Autonomous Driving
Code Code Available 1Expressive Scene Graph Generation Using Commonsense Knowledge Infusion for Visual Understanding and Reasoning May 31, 2022 Common Sense Reasoning Graph Generation
Code Code Available 1A2-FPN for Semantic Segmentation of Fine-Resolution Remotely Sensed Images Feb 16, 2021 Decision Making Scene Understanding
Code Code Available 1Exploiting Edge-Oriented Reasoning for 3D Point-based Scene Graph Analysis Mar 9, 2021 3d scene graph generation graph construction
Code Code Available 1AVSegFormer: Audio-Visual Segmentation with Transformer Jul 3, 2023 Decoder Scene Understanding
Code Code Available 1Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts Dec 16, 2020 3D Semantic Segmentation Instance Segmentation
Code Code Available 1From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object Detection Jul 30, 2021 3D Object Detection object-detection
Code Code Available 1Global-Reasoned Multi-Task Learning Model for Surgical Scene Understanding Jan 28, 2022 Graph Attention Knowledge Distillation
Code Code Available 1Estimating and Exploiting the Aleatoric Uncertainty in Surface Normal Estimation Sep 20, 2021 Decoder Prediction
Code Code Available 1Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation Oct 30, 2020 Instance Segmentation Panoptic Segmentation
Code Code Available 1Estimating Generic 3D Room Structures from 2D Annotations Jun 15, 2023 Scene Understanding
Code Code Available 1Automatic Extrinsic Calibration Method for LiDAR and Camera Sensor Setups Jan 12, 2021 Scene Understanding
Code Code Available 1OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge May 31, 2019 object-detection Object Detection
Code Code Available 1Event-aided Semantic Scene Completion Feb 4, 2025 Autonomous Driving Scene Understanding
Code Code Available 1A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning Mar 10, 2025 Object Scene Understanding
Code Code Available 1EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery Jan 20, 2025 Language Modeling Language Modelling
Code Code Available 13DRM:Pair-wise relation module for 3D object detection Feb 20, 2022 3D Object Detection Object
Code Code Available 1Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge Nov 21, 2023 Large Language Model Multimodal Deep Learning
Code Code Available 1Event-based Motion Segmentation with Spatio-Temporal Graph Cuts Dec 16, 2020 Motion Segmentation Scene Understanding
Code Code Available 1