On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving Nov 9, 2023 Autonomous Driving Common Sense Reasoning
Code Code Available 25 Multi-Task Learning as Multi-Objective Optimization Oct 10, 2018 Depth Estimation General Classification
Code Code Available 25 RS-Agent: Automating Remote Sensing Tasks through Intelligent Agent Jun 11, 2024 AI Agent Descriptive
Code Code Available 25 Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer Jul 28, 2022 Autonomous Driving Autonomous Vehicles
Code Code Available 25 NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLM Feb 16, 2025 Navigate RAG
Code Code Available 25 Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes Aug 17, 2023 Language Modeling Language Modelling
Code Code Available 25 OSMLoc: Single Image-Based Visual Localization in OpenStreetMap with Fused Geometric and Semantic Guidance Nov 13, 2024 Depth Estimation Monocular Depth Estimation
Code Code Available 25 MAPLM: A Real-World Large-Scale Vision-Language Benchmark for Map and Traffic Scene Understanding Jan 1, 2024 Autonomous Driving Instruction Following
Code Code Available 25 Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation Mar 17, 2025 Data Interaction Scene Understanding
Code Code Available 25 MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering May 20, 2024 Benchmarking Question Answering
Code Code Available 25 Chameleon: Fast-slow Neuro-symbolic Lane Topology Extraction Mar 10, 2025 Autonomous Driving Scene Understanding
Code Code Available 25 StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images Jun 19, 2024 Object Recognition Scene Understanding
Code Code Available 25 Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding Sep 5, 2024 Question Answering Scene Understanding
Code Code Available 25 InvPT++: Inverted Pyramid Multi-Task Transformer for Visual Scene Understanding Jun 8, 2023 Decoder Multi-Task Learning
Code Code Available 25 IRef-VLA: A Benchmark for Interactive Referential Grounding with Imperfect Language in 3D Scenes Mar 20, 2025 Scene Understanding Spatial Reasoning
Code Code Available 25 LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences Dec 2, 2024 Embodied Question Answering Question Answering
Code Code Available 25 Panoptic Lifting for 3D Scene Understanding with Neural Fields Dec 19, 2022 2D Panoptic Segmentation Panoptic Segmentation
Code Code Available 25 Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding Nov 4, 2020 Multi-Task Learning Scene Understanding
Code Code Available 25 HAKE: A Knowledge Engine Foundation for Human Activity Understanding Feb 14, 2022 Action Recognition Human-Object Interaction Detection
Code Code Available 25 GroupViT: Semantic Segmentation Emerges from Text Supervision Feb 22, 2022 Object Detection Scene Understanding
Code Code Available 25 GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks Nov 28, 2024 Benchmarking Object Counting
Code Code Available 25 ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding Oct 17, 2024 3D Semantic Segmentation Image Generation
Code Code Available 25 COB-GS: Clear Object Boundaries in 3DGS Segmentation Based on Boundary-Adaptive Gaussian Splitting Mar 25, 2025 3DGS Object
Code Code Available 25 Grounded 3D-LLM with Referent Tokens May 16, 2024 Dense Captioning Diversity
Code Code Available 25 Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning Mar 1, 2025 Scene Understanding
Code Code Available 25 GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis Jan 30, 2023 Image Generation Scene Understanding
Code Code Available 25 A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future Jul 18, 2023 Knowledge Distillation object-detection
Code Code Available 25 Hier-SLAM: Scaling-up Semantics in SLAM with a Hierarchically Categorical Gaussian Splatting Sep 19, 2024 Scene Understanding Semantic Segmentation
Code Code Available 25 FusionVision: A comprehensive approach of 3D object reconstruction and segmentation from RGB-D cameras using YOLO and fast segment anything Feb 29, 2024 3D Object Reconstruction Instance Segmentation
Code Code Available 25 A Unified Framework for 3D Scene Understanding Jul 3, 2024 Contrastive Learning Knowledge Distillation
Code Code Available 25 PLA: Language-Driven Open-Vocabulary 3D Scene Understanding Nov 29, 2022 3D Open-Vocabulary Instance Segmentation Contrastive Learning
Code Code Available 25 Learning Multiple Probabilistic Decisions from Latent World Model in Autonomous Driving Sep 24, 2024 Autonomous Driving Imitation Learning
Code Code Available 25 BinsFormer: Revisiting Adaptive Bins for Monocular Depth Estimation Apr 3, 2022 Decoder Depth Estimation
Code Code Available 25 AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving Dec 19, 2024 Autonomous Driving Benchmarking
Code Code Available 25 CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers Mar 9, 2022 3D Object Detection Autonomous Vehicles
Code Code Available 25 CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition Mar 20, 2023 Retrieval Scene Understanding
Code Code Available 25 Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion Jul 8, 2025 3D geometry Domain Generalization
Code Code Available 25 Gaussian Grouping: Segment and Edit Anything in 3D Scenes Dec 1, 2023 Colorization NeRF
Code Code Available 25 NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields Apr 1, 2024 3D Object Detection NeRF
Code Code Available 25 On Deep Learning for Geometric and Semantic Scene Understanding Using On-Vehicle 3D LiDAR Nov 1, 2024 3D Semantic Segmentation Autonomous Driving
Code Code Available 25 OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies May 8, 2024 Domain Adaptation Scene Understanding
Code Code Available 25 OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation Sep 1, 2023 3D Open-Vocabulary Instance Segmentation 3D Open-Vocabulary Object Detection
Code Code Available 25 An Egocentric Vision-Language Model based Portable Real-time Smart Assistant Mar 6, 2025 Language Modeling Language Modelling
Code Code Available 25 An End-to-End Robust Point Cloud Semantic Segmentation Network with Single-Step Conditional Diffusion Models Nov 25, 2024 Denoising Scene Understanding
Code Code Available 25 EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI Dec 26, 2023 Scene Understanding
Code Code Available 25 Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning Dec 16, 2024 Hallucination Robot Manipulation
Code Code Available 25 Panoptic Scene Graph Generation Jul 22, 2022 Benchmarking Panoptic Scene Graph Generation
Code Code Available 25 Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers Dec 13, 2023 3D Question Answering (3D-QA) Attribute
Code Code Available 25 EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding Dec 5, 2024 Prediction Scene Understanding
Code Code Available 25 GaussianPretrain: A Simple Unified 3D Gaussian Representation for Visual Pre-training in Autonomous Driving Nov 19, 2024 3D Object Detection Autonomous Driving
Code Code Available 25