RelationField: Relate Anything in Radiance Fields Dec 18, 2024 3d scene graph generation Graph Generation
Code Code Available 2Is Your LiDAR Placement Optimized for 3D Scene Understanding? Mar 25, 2024 3D Object Detection LIDAR Semantic Segmentation
Code Code Available 2RS-Agent: Automating Remote Sensing Tasks through Intelligent Agent Jun 11, 2024 AI Agent Descriptive
Code Code Available 2Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer Jul 28, 2022 Autonomous Driving Autonomous Vehicles
Code Code Available 2Multi-Task Learning as Multi-Objective Optimization Oct 10, 2018 Depth Estimation General Classification
Code Code Available 2MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering May 20, 2024 Benchmarking Question Answering
Code Code Available 2NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLM Feb 16, 2025 Navigate RAG
Code Code Available 2Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives Feb 5, 2024 Continual Learning Multi-Task Learning
Code Code Available 2LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences Dec 2, 2024 Embodied Question Answering Question Answering
Code Code Available 2PLA: Language-Driven Open-Vocabulary 3D Scene Understanding Nov 29, 2022 3D Open-Vocabulary Instance Segmentation Contrastive Learning
Code Code Available 2Diffusion-based Generation, Optimization, and Planning in 3D Scenes Jan 15, 2023 Denoising Grasp Generation
Code Code Available 2StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images Jun 19, 2024 Object Recognition Scene Understanding
Code Code Available 2MAPLM: A Real-World Large-Scale Vision-Language Benchmark for Map and Traffic Scene Understanding Jan 1, 2024 Autonomous Driving Instruction Following
Code Code Available 2NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields Apr 1, 2024 3D Object Detection NeRF
Code Code Available 2OSMLoc: Single Image-Based Visual Localization in OpenStreetMap with Fused Geometric and Semantic Guidance Nov 13, 2024 Depth Estimation Monocular Depth Estimation
Code Code Available 2Hier-SLAM: Scaling-up Semantics in SLAM with a Hierarchically Categorical Gaussian Splatting Sep 19, 2024 Scene Understanding Semantic Segmentation
Code Code Available 2GroupViT: Semantic Segmentation Emerges from Text Supervision Feb 22, 2022 Object Detection Scene Understanding
Code Code Available 2Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning Mar 1, 2025 Scene Understanding
Code Code Available 2AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving Dec 19, 2024 Autonomous Driving Benchmarking
Code Code Available 2GaussianPretrain: A Simple Unified 3D Gaussian Representation for Visual Pre-training in Autonomous Driving Nov 19, 2024 3D Object Detection Autonomous Driving
Code Code Available 2Grounded 3D-LLM with Referent Tokens May 16, 2024 Dense Captioning Diversity
Code Code Available 2InvPT: Inverted Pyramid Multi-task Transformer for Dense Scene Understanding Mar 15, 2022 Boundary Detection Human Parsing
Code Code Available 2FusionVision: A comprehensive approach of 3D object reconstruction and segmentation from RGB-D cameras using YOLO and fast segment anything Feb 29, 2024 3D Object Reconstruction Instance Segmentation
Code Code Available 2GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks Nov 28, 2024 Benchmarking Object Counting
Code Code Available 2GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis Jan 30, 2023 Image Generation Scene Understanding
Code Code Available 2HAKE: A Knowledge Engine Foundation for Human Activity Understanding Feb 14, 2022 Action Recognition Human-Object Interaction Detection
Code Code Available 2A Unified Framework for 3D Scene Understanding Jul 3, 2024 Contrastive Learning Knowledge Distillation
Code Code Available 2Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding Nov 4, 2020 Multi-Task Learning Scene Understanding
Code Code Available 2IRef-VLA: A Benchmark for Interactive Referential Grounding with Imperfect Language in 3D Scenes Mar 20, 2025 Scene Understanding Spatial Reasoning
Code Code Available 2BinsFormer: Revisiting Adaptive Bins for Monocular Depth Estimation Apr 3, 2022 Decoder Depth Estimation
Code Code Available 2Learning Multiple Probabilistic Decisions from Latent World Model in Autonomous Driving Sep 24, 2024 Autonomous Driving Imitation Learning
Code Code Available 2Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding Sep 5, 2024 Question Answering Scene Understanding
Code Code Available 2A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future Jul 18, 2023 Knowledge Distillation object-detection
Code Code Available 2Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion Jul 8, 2025 3D geometry Domain Generalization
Code Code Available 2Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding Mar 25, 2024 Data Augmentation Scene Understanding
Code Code Available 2EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI Dec 26, 2023 Scene Understanding
Code Code Available 2Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning Dec 16, 2024 Hallucination Robot Manipulation
Code Code Available 2CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers Mar 9, 2022 3D Object Detection Autonomous Vehicles
Code Code Available 2Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers Dec 13, 2023 3D Question Answering (3D-QA) Attribute
Code Code Available 2Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes Aug 17, 2023 Language Modeling Language Modelling
Code Code Available 2CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition Mar 20, 2023 Retrieval Scene Understanding
Code Code Available 2OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies May 8, 2024 Domain Adaptation Scene Understanding
Code Code Available 2An Egocentric Vision-Language Model based Portable Real-time Smart Assistant Mar 6, 2025 Language Modeling Language Modelling
Code Code Available 2An End-to-End Robust Point Cloud Semantic Segmentation Network with Single-Step Conditional Diffusion Models Nov 25, 2024 Denoising Scene Understanding
Code Code Available 2COB-GS: Clear Object Boundaries in 3DGS Segmentation Based on Boundary-Adaptive Gaussian Splitting Mar 25, 2025 3DGS Object
Code Code Available 2ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding Oct 17, 2024 3D Semantic Segmentation Image Generation
Code Code Available 2Panoptic Scene Graph Generation Jul 22, 2022 Benchmarking Panoptic Scene Graph Generation
Code Code Available 2Pixel-Wise Recognition for Holistic Surgical Scene Understanding Jan 20, 2024 Scene Understanding Segmentation
Code Code Available 2EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding Dec 5, 2024 Prediction Scene Understanding
Code Code Available 2Gaussian Grouping: Segment and Edit Anything in 3D Scenes Dec 1, 2023 Colorization NeRF
Code Code Available 2