CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition Mar 20, 2023 Retrieval Scene Understanding
Code Code Available 2GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis Jan 30, 2023 Image Generation Scene Understanding
Code Code Available 2Diffusion-based Generation, Optimization, and Planning in 3D Scenes Jan 15, 2023 Denoising Grasp Generation
Code Code Available 2Panoptic Lifting for 3D Scene Understanding with Neural Fields Dec 19, 2022 2D Panoptic Segmentation Panoptic Segmentation
Code Code Available 2PLA: Language-Driven Open-Vocabulary 3D Scene Understanding Nov 29, 2022 3D Open-Vocabulary Instance Segmentation Contrastive Learning
Code Code Available 2OpenScene: 3D Scene Understanding with Open Vocabularies Nov 28, 2022 3D Open-Vocabulary Instance Segmentation 3D Semantic Segmentation
Code Code Available 2Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer Jul 28, 2022 Autonomous Driving Autonomous Vehicles
Code Code Available 2Panoptic Scene Graph Generation Jul 22, 2022 Benchmarking Panoptic Scene Graph Generation
Code Code Available 2BinsFormer: Revisiting Adaptive Bins for Monocular Depth Estimation Apr 3, 2022 Decoder Depth Estimation
Code Code Available 2InvPT: Inverted Pyramid Multi-task Transformer for Dense Scene Understanding Mar 15, 2022 Boundary Detection Human Parsing
Code Code Available 2CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers Mar 9, 2022 3D Object Detection Autonomous Vehicles
Code Code Available 2GroupViT: Semantic Segmentation Emerges from Text Supervision Feb 22, 2022 Object Detection Scene Understanding
Code Code Available 2HAKE: A Knowledge Engine Foundation for Human Activity Understanding Feb 14, 2022 Action Recognition Human-Object Interaction Detection
Code Code Available 2Panoptic nuScenes: A Large-Scale Benchmark for LiDAR Panoptic Segmentation and Tracking Sep 8, 2021 Benchmarking Diversity
Code Code Available 2Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding Nov 4, 2020 Multi-Task Learning Scene Understanding
Code Code Available 2Multi-Task Learning as Multi-Objective Optimization Oct 10, 2018 Depth Estimation General Classification
Code Code Available 2Learning to Tune Like an Expert: Interpretable and Scene-Aware Navigation via MLLM Reasoning and CVAE-Based Adaptation Jul 15, 2025 Large Language Model Scene Understanding
Code Code Available 1SurgTPGS: Semantic 3D Surgical Scene Understanding with Text Promptable Gaussian Splatting Jun 29, 2025 3D Reconstruction Scene Understanding
Code Code Available 1ReME: A Data-Centric Framework for Training-Free Open-Vocabulary Segmentation Jun 26, 2025 Open Vocabulary Semantic Segmentation Open-Vocabulary Semantic Segmentation
Code Code Available 1DIP: Unsupervised Dense In-Context Post-training of Visual Representations Jun 23, 2025 GPU Meta-Learning
Code Code Available 1STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous Driving Jun 6, 2025 Autonomous Driving Autonomous Vehicles
Code Code Available 1OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis Jun 4, 2025 Action Generation Decision Making
Code Code Available 1PhysGaia: A Physics-Aware Dataset of Multi-Body Interactions for Dynamic Novel View Synthesis Jun 3, 2025 Novel View Synthesis Scene Understanding
Code Code Available 1CoNav: Collaborative Cross-Modal Reasoning for Embodied Navigation May 22, 2025 Scene Understanding Spatial Reasoning
Code Code Available 1StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation May 15, 2025 Face Recognition Object
Code Code Available 1Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving May 13, 2025 3D visual grounding Autonomous Driving
Code Code Available 1Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization May 8, 2025 Scene Understanding Sound Source Localization
Code Code Available 1LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics Apr 30, 2025 In-Context Learning Object
Code Code Available 1Training-Free Hierarchical Scene Understanding for Gaussian Splatting with Superpoint Graphs Apr 17, 2025 3D geometry 3DGS
Code Code Available 1DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency Apr 16, 2025 Few-Shot Learning Interactive Segmentation
Code Code Available 1SoccerNet-v3D: Leveraging Sports Broadcast Replays for 3D Scene Understanding Apr 14, 2025 Camera Calibration Object Localization
Code Code Available 1Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding Apr 9, 2025 Scene Understanding Self-Supervised Learning
Code Code Available 1CamContextI2V: Context-aware Controllable Video Generation Apr 8, 2025 Diversity Scene Understanding
Code Code Available 1F-ViTA: Foundation Model Guided Visible to Thermal Translation Apr 3, 2025 Scene Understanding Style Transfer
Code Code Available 1Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision Apr 3, 2025 3D Object Detection cross-modal alignment
Code Code Available 1WikiVideo: Article Generation from Multiple Videos Apr 1, 2025 Articles RAG
Code Code Available 1Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation Model Mar 30, 2025 Depth Estimation Monocular Depth Estimation
Code Code Available 1Mitigating Trade-off: Stream and Query-guided Aggregation for Efficient and Effective 3D Occupancy Prediction Mar 28, 2025 Autonomous Driving Scene Understanding
Code Code Available 1The Coralscapes Dataset: Semantic Scene Understanding in Coral Reefs Mar 25, 2025 Benchmarking Scene Segmentation
Code Code Available 1Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding Mar 20, 2025 Scene Understanding
Code Code Available 1NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models Mar 17, 2025 Question Answering Scene Understanding
Code Code Available 1Logic-RAG: Augmenting Large Multimodal Models with Visual-Spatial Knowledge for Road Scene Understanding Mar 16, 2025 Autonomous Driving RAG
Code Code Available 1A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning Mar 10, 2025 Object Scene Understanding
Code Code Available 1VLScene: Vision-Language Guidance Distillation for Camera-Based 3D Semantic Scene Completion Mar 8, 2025 3D Semantic Scene Completion Autonomous Driving
Code Code Available 1Occlusion-aware Non-Rigid Point Cloud Registration via Unsupervised Neural Deformation Correntropy Feb 15, 2025 Point Cloud Registration Scene Understanding
Code Code Available 1Event-aided Semantic Scene Completion Feb 4, 2025 Autonomous Driving Scene Understanding
Code Code Available 1EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery Jan 20, 2025 Language Modeling Language Modelling
Code Code Available 1A Survey of World Models for Autonomous Driving Jan 20, 2025 Anomaly Detection Autonomous Driving
Code Code Available 13UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding Jan 14, 2025 Language Modeling Language Modelling
Code Code Available 1All-Day Multi-Camera Multi-Target Tracking Jan 1, 2025 All Mamba
Code Code Available 1