When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models May 16, 2024 In-Context Learning Question Answering
Code Code Available 7Trajectory Prediction Meets Large Language Models: A Survey Jun 3, 2025 Language Modeling Language Modelling
Code Code Available 5OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model Mar 30, 2025 Autonomous Driving Decision Making
Code Code Available 4Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator Feb 26, 2025 Depth Estimation Diversity
Code Code Available 4GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models Jan 2, 2025 Scene Understanding text annotation
Code Code Available 4Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving Oct 29, 2024 Autonomous Driving Scene Understanding
Code Code Available 4Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models Feb 12, 2024 Hallucination Object Localization
Code Code Available 4SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM Dec 4, 2023 Camera Pose Estimation Novel View Synthesis
Code Code Available 4Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation Dec 4, 2023 Depth Estimation GPU
Code Code Available 4DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation Apr 7, 2025 3D geometry RGBD Semantic Segmentation
Code Code Available 3SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining Mar 23, 2025 3DGS Benchmarking
Code Code Available 3CrossOver: 3D Scene Cross-Modal Alignment Feb 20, 2025 cross-modal alignment Object
Code Code Available 3HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation Jan 24, 2025 Autonomous Driving Language Modeling
Code Code Available 3STORM: Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes Dec 31, 2024 Dynamic Reconstruction Scene Flow Estimation
Code Code Available 3EPRecon: An Efficient Framework for Real-Time Panoptic 3D Reconstruction from Monocular Video Sep 3, 2024 3D Reconstruction Scene Understanding
Code Code Available 3DeepInteraction++: Multi-Modality Interaction for Autonomous Driving Aug 9, 2024 3D Object Detection Autonomous Driving
Code Code Available 3AudioBench: A Universal Benchmark for Audio Large Language Models Jun 23, 2024 Audio Scene Understanding Instruction Following
Code Code Available 34D Panoptic Scene Graph Generation May 16, 2024 4D Panoptic Segmentation Graph Generation
Code Code Available 3Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving May 8, 2024 Autonomous Driving LIDAR Semantic Segmentation
Code Code Available 3Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation Apr 5, 2024 Decoder Mamba
Code Code Available 3MoAI: Mixture of All Intelligence for Large Language and Vision Models Mar 12, 2024 All Mixture-of-Experts
Code Code Available 3Embodied Understanding of Driving Scenarios Mar 7, 2024 Autonomous Driving Language Modeling
Code Code Available 3Swin3D++: Effective Multi-Source Pretraining for 3D Indoor Scene Understanding Feb 22, 2024 Diversity Scene Understanding
Code Code Available 3SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM Feb 5, 2024 3D Semantic Segmentation Camera Pose Estimation
Code Code Available 3GARField: Group Anything with Radiance Fields Jan 17, 2024 Scene Understanding
Code Code Available 3Generalized Robot 3D Vision-Language Model with Fast Rendering and Pre-Training Vision-Language Alignment Dec 1, 2023 Contrastive Learning Few-Shot Learning
Code Code Available 3Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models Nov 11, 2023 Image Captioning MMR total
Code Code Available 3iDisc: Internal Discretization for Monocular Depth Estimation Apr 13, 2023 Autonomous Driving Depth Estimation
Code Code Available 3SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving Mar 16, 2023 3D Object Detection Autonomous Driving
Code Code Available 3Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion Jul 8, 2025 3D geometry Domain Generalization
Code Code Available 2SIU3R: Simultaneous Scene Understanding and 3D Reconstruction Beyond Feature Alignment Jul 3, 2025 3D Reconstruction Scene Understanding
Code Code Available 2Tackling View-Dependent Semantics in 3D Language Gaussian Splatting May 30, 2025 3D Scene Reconstruction Scene Understanding
Code Code Available 2Scene-Centric Unsupervised Panoptic Segmentation Apr 2, 2025 Instance Segmentation Panoptic Segmentation
Code Code Available 2Towards Generating Realistic 3D Semantic Training Data for Autonomous Driving Mar 27, 2025 3D Semantic Segmentation Autonomous Driving
Code Code Available 2COB-GS: Clear Object Boundaries in 3DGS Segmentation Based on Boundary-Adaptive Gaussian Splitting Mar 25, 2025 3DGS Object
Code Code Available 2SuperFlow++: Enhanced Spatiotemporal Consistency for Cross-Modal Data Pretraining Mar 25, 2025 Autonomous Driving Computational Efficiency
Code Code Available 2PolarFree: Polarization-based Reflection-free Imaging Mar 23, 2025 Reflection Removal Scene Understanding
Code Code Available 2IRef-VLA: A Benchmark for Interactive Referential Grounding with Imperfect Language in 3D Scenes Mar 20, 2025 Scene Understanding Spatial Reasoning
Code Code Available 2Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation Mar 17, 2025 Data Interaction Scene Understanding
Code Code Available 2TrackOcc: Camera-based 4D Panoptic Occupancy Tracking Mar 11, 2025 3D Object Tracking Object Tracking
Code Code Available 2Chameleon: Fast-slow Neuro-symbolic Lane Topology Extraction Mar 10, 2025 Autonomous Driving Scene Understanding
Code Code Available 2An Egocentric Vision-Language Model based Portable Real-time Smart Assistant Mar 6, 2025 Language Modeling Language Modelling
Code Code Available 2Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning Mar 1, 2025 Scene Understanding
Code Code Available 2NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLM Feb 16, 2025 Navigate RAG
Code Code Available 2VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment Jan 3, 2025 Computational Efficiency Scene Understanding
Code Code Available 23DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding Dec 24, 2024 Natural Language Understanding Scene Understanding
Code Code Available 2AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving Dec 19, 2024 Autonomous Driving Benchmarking
Code Code Available 2RelationField: Relate Anything in Radiance Fields Dec 18, 2024 3d scene graph generation Graph Generation
Code Code Available 2DINO-Foresight: Looking into the Future with DINO Dec 16, 2024 Autonomous Driving Scene Understanding
Code Code Available 2Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning Dec 16, 2024 Hallucination Robot Manipulation
Code Code Available 2