When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models May 16, 2024 In-Context Learning Question Answering
Code Code Available 7Trajectory Prediction Meets Large Language Models: A Survey Jun 3, 2025 Language Modeling Language Modelling
Code Code Available 5OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model Mar 30, 2025 Autonomous Driving Decision Making
Code Code Available 4GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models Jan 2, 2025 Scene Understanding text annotation
Code Code Available 4Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving Oct 29, 2024 Autonomous Driving Scene Understanding
Code Code Available 4Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation Dec 4, 2023 Depth Estimation GPU
Code Code Available 4Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models Feb 12, 2024 Hallucination Object Localization
Code Code Available 4Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator Feb 26, 2025 Depth Estimation Diversity
Code Code Available 4SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM Dec 4, 2023 Camera Pose Estimation Novel View Synthesis
Code Code Available 4SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining Mar 23, 2025 3DGS Benchmarking
Code Code Available 3CrossOver: 3D Scene Cross-Modal Alignment Feb 20, 2025 cross-modal alignment Object
Code Code Available 3EPRecon: An Efficient Framework for Real-Time Panoptic 3D Reconstruction from Monocular Video Sep 3, 2024 3D Reconstruction Scene Understanding
Code Code Available 3Embodied Understanding of Driving Scenarios Mar 7, 2024 Autonomous Driving Language Modeling
Code Code Available 3iDisc: Internal Discretization for Monocular Depth Estimation Apr 13, 2023 Autonomous Driving Depth Estimation
Code Code Available 3DeepInteraction++: Multi-Modality Interaction for Autonomous Driving Aug 9, 2024 3D Object Detection Autonomous Driving
Code Code Available 3HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation Jan 24, 2025 Autonomous Driving Language Modeling
Code Code Available 3GARField: Group Anything with Radiance Fields Jan 17, 2024 Scene Understanding
Code Code Available 3STORM: Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes Dec 31, 2024 Dynamic Reconstruction Scene Flow Estimation
Code Code Available 3SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM Feb 5, 2024 3D Semantic Segmentation Camera Pose Estimation
Code Code Available 3Swin3D++: Effective Multi-Source Pretraining for 3D Indoor Scene Understanding Feb 22, 2024 Diversity Scene Understanding
Code Code Available 3Generalized Robot 3D Vision-Language Model with Fast Rendering and Pre-Training Vision-Language Alignment Dec 1, 2023 Contrastive Learning Few-Shot Learning
Code Code Available 3AudioBench: A Universal Benchmark for Audio Large Language Models Jun 23, 2024 Audio Scene Understanding Instruction Following
Code Code Available 34D Panoptic Scene Graph Generation May 16, 2024 4D Panoptic Segmentation Graph Generation
Code Code Available 3SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving Mar 16, 2023 3D Object Detection Autonomous Driving
Code Code Available 3Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation Apr 5, 2024 Decoder Mamba
Code Code Available 3Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving May 8, 2024 Autonomous Driving LIDAR Semantic Segmentation
Code Code Available 3MoAI: Mixture of All Intelligence for Large Language and Vision Models Mar 12, 2024 All Mixture-of-Experts
Code Code Available 3DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation Apr 7, 2025 3D geometry RGBD Semantic Segmentation
Code Code Available 3Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models Nov 11, 2023 Image Captioning MMR total
Code Code Available 3Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning Mar 1, 2025 Scene Understanding
Code Code Available 2InvPT: Inverted Pyramid Multi-task Transformer for Dense Scene Understanding Mar 15, 2022 Boundary Detection Human Parsing
Code Code Available 2Hier-SLAM: Scaling-up Semantics in SLAM with a Hierarchically Categorical Gaussian Splatting Sep 19, 2024 Scene Understanding Semantic Segmentation
Code Code Available 2BinsFormer: Revisiting Adaptive Bins for Monocular Depth Estimation Apr 3, 2022 Decoder Depth Estimation
Code Code Available 2Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding Nov 4, 2020 Multi-Task Learning Scene Understanding
Code Code Available 2InvPT++: Inverted Pyramid Multi-Task Transformer for Visual Scene Understanding Jun 8, 2023 Decoder Multi-Task Learning
Code Code Available 2GroupViT: Semantic Segmentation Emerges from Text Supervision Feb 22, 2022 Object Detection Scene Understanding
Code Code Available 2Grounded 3D-LLM with Referent Tokens May 16, 2024 Dense Captioning Diversity
Code Code Available 2HAKE: A Knowledge Engine Foundation for Human Activity Understanding Feb 14, 2022 Action Recognition Human-Object Interaction Detection
Code Code Available 2GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks Nov 28, 2024 Benchmarking Object Counting
Code Code Available 2AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving Dec 19, 2024 Autonomous Driving Benchmarking
Code Code Available 23DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding Dec 24, 2024 Natural Language Understanding Scene Understanding
Code Code Available 2GaussianPretrain: A Simple Unified 3D Gaussian Representation for Visual Pre-training in Autonomous Driving Nov 19, 2024 3D Object Detection Autonomous Driving
Code Code Available 2IRef-VLA: A Benchmark for Interactive Referential Grounding with Imperfect Language in 3D Scenes Mar 20, 2025 Scene Understanding Spatial Reasoning
Code Code Available 2A Unified Framework for 3D Scene Understanding Jul 3, 2024 Contrastive Learning Knowledge Distillation
Code Code Available 2FusionVision: A comprehensive approach of 3D object reconstruction and segmentation from RGB-D cameras using YOLO and fast segment anything Feb 29, 2024 3D Object Reconstruction Instance Segmentation
Code Code Available 2A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future Jul 18, 2023 Knowledge Distillation object-detection
Code Code Available 2Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion Jul 8, 2025 3D geometry Domain Generalization
Code Code Available 2GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis Jan 30, 2023 Image Generation Scene Understanding
Code Code Available 2EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI Dec 26, 2023 Scene Understanding
Code Code Available 2EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding Dec 5, 2024 Prediction Scene Understanding
Code Code Available 2