Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models Jul 17, 2025 3D Point Cloud Reconstruction Point cloud reconstruction
— Unverified 0City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning Jul 17, 2025 Question Answering Scene Understanding
— Unverified 0Advancing Complex Wide-Area Scene Understanding with Hierarchical Coresets Selection Jul 17, 2025 Scene Understanding
— Unverified 0Seeing the Signs: A Survey of Edge-Deployable OCR Models for Billboard Visibility Analysis Jul 15, 2025 Marketing Optical Character Recognition
— Unverified 0Tactical Decision for Multi-UGV Confrontation with a Vision-Language Model-Based Commander Jul 15, 2025 Language Modeling Language Modelling
— Unverified 0Learning to Tune Like an Expert: Interpretable and Scene-Aware Navigation via MLLM Reasoning and CVAE-Based Adaptation Jul 15, 2025 Large Language Model Scene Understanding
Code Code Available 1EmbRACE-3K: Embodied Reasoning and Action in Complex Environments Jul 14, 2025 Scene Understanding Spatial Reasoning
— Unverified 0MUVOD: A Novel Multi-view Video Object Segmentation Dataset and A Benchmark for 3D Segmentation Jul 10, 2025 NeRF Object
— Unverified 0OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding Jul 10, 2025 Scene Understanding Spatial Reasoning
Code Code Available 0What Demands Attention in Urban Street Scenes? From Scene Understanding towards Road Safety: A Survey of Vision-driven Datasets and Studies Jul 9, 2025 Scene Understanding Survey
— Unverified 0Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion Jul 8, 2025 3D geometry Domain Generalization
Code Code Available 2SIU3R: Simultaneous Scene Understanding and 3D Reconstruction Beyond Feature Alignment Jul 3, 2025 3D Reconstruction Scene Understanding
Code Code Available 2SurgTPGS: Semantic 3D Surgical Scene Understanding with Text Promptable Gaussian Splatting Jun 29, 2025 3D Reconstruction Scene Understanding
Code Code Available 1VoteSplat: Hough Voting Gaussian Splatting for 3D Scene Understanding Jun 28, 2025 3DGS Instance Segmentation
— Unverified 0ReME: A Data-Centric Framework for Training-Free Open-Vocabulary Segmentation Jun 26, 2025 Open Vocabulary Semantic Segmentation Open-Vocabulary Semantic Segmentation
Code Code Available 1CoPa-SG: Dense Scene Graphs with Parametric and Proto-Relations Jun 26, 2025 Graph Generation Relation
— Unverified 0IPFormer: Visual 3D Panoptic Scene Completion with Context-Adaptive Instance Proposals Jun 25, 2025 Scene Understanding
— Unverified 0DreamAnywhere: Object-Centric Panoramic 3D Scene Generation Jun 25, 2025 Novel View Synthesis Object
— Unverified 0Case-based Reasoning Augmented Large Language Model Framework for Decision Making in Realistic Safety-Critical Driving Scenarios Jun 25, 2025 Autonomous Driving Decision Making
— Unverified 0HOIverse: A Synthetic Scene Graph Dataset With Human Object Interactions Jun 24, 2025 Graph Generation Human-Object Interaction Detection
— Unverified 0DIP: Unsupervised Dense In-Context Post-training of Visual Representations Jun 23, 2025 GPU Meta-Learning
Code Code Available 1Scene-R1: Video-Grounded Large Language Models for 3D Scene Reasoning without 3D Annotations Jun 21, 2025 Question Answering Scene Understanding
— Unverified 0SceneAware: Scene-Constrained Pedestrian Trajectory Prediction with LLM-Guided Walkability Jun 17, 2025 Pedestrian Trajectory Prediction Scene Understanding
Code Code Available 0Leader360V: The Large-scale, Real-world 360 Video Dataset for Multi-task Learning in Diverse Environment Jun 17, 2025 Autonomous Driving Instance Segmentation
— Unverified 0Image Segmentation with Large Language Models: A Survey with Perspectives for Intelligent Transportation Systems Jun 17, 2025 Autonomous Driving Image Segmentation
— Unverified 0Unified Representation Space for 3D Visual Grounding Jun 17, 2025 3D visual grounding Contrastive Learning
— Unverified 0FreeQ-Graph: Free-form Querying with Semantic Consistent Scene Graph for 3D Scene Understanding Jun 16, 2025 Form Graph Generation
— Unverified 0SceneCompleter: Dense 3D Scene Completion for Generative Novel View Synthesis Jun 12, 2025 Novel View Synthesis Scene Understanding
— Unverified 0SemanticSplat: Feed-Forward 3D Scene Understanding with Language-Aware Gaussian Fields Jun 11, 2025 3D Reconstruction Scene Understanding
— Unverified 0PhyBlock: A Progressive Benchmark for Physical Understanding and Planning via 3D Block Assembly Jun 10, 2025 Question Answering Scene Understanding
— Unverified 0Robust Visual Localization via Semantic-Guided Multi-Scale Transformer Jun 10, 2025 regression Scene Understanding
— Unverified 0SceneSplat++: A Large Dataset and Comprehensive Benchmark for Language Gaussian Splatting Jun 10, 2025 3DGS Scene Understanding
— Unverified 0OpenSplat3D: Open-Vocabulary 3D Instance Segmentation using Gaussian Splatting Jun 9, 2025 3DGS 3D Instance Segmentation
— Unverified 0Design and Evaluation of Deep Learning-Based Dual-Spectrum Image Fusion Methods Jun 9, 2025 Fairness Scene Understanding
— Unverified 0SpatialLM: Training Large Language Models for Structured Indoor Modeling Jun 9, 2025 3D Object Detection Language Modeling
— Unverified 0STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous Driving Jun 6, 2025 Autonomous Driving Autonomous Vehicles
Code Code Available 1Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs Jun 5, 2025 cross-modal alignment Dense Captioning
— Unverified 0ProJo4D: Progressive Joint Optimization for Sparse-View Inverse Physics Estimation Jun 5, 2025 3D Reconstruction NeRF
— Unverified 0OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis Jun 4, 2025 Action Generation Decision Making
Code Code Available 1Tactile MNIST: Benchmarking Active Tactile Perception Jun 3, 2025 Benchmarking Scene Understanding
— Unverified 0Attention-based transformer models for image captioning across languages: An in-depth survey and evaluation Jun 3, 2025 Caption Generation Image Captioning
— Unverified 0Trajectory Prediction Meets Large Language Models: A Survey Jun 3, 2025 Language Modeling Language Modelling
Code Code Available 5PhysGaia: A Physics-Aware Dataset of Multi-Body Interactions for Dynamic Novel View Synthesis Jun 3, 2025 Novel View Synthesis Scene Understanding
Code Code Available 1SAM2-LOVE: Segment Anything Model 2 in Language-aided Audio-Visual Scenes Jun 2, 2025 Scene Understanding
— Unverified 0Learning Sparsity for Effective and Efficient Music Performance Question Answering Jun 2, 2025 Audio-visual Question Answering Question Answering
— Unverified 0Tackling View-Dependent Semantics in 3D Language Gaussian Splatting May 30, 2025 3D Scene Reconstruction Scene Understanding
Code Code Available 2Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors May 30, 2025 3D geometry Large Language Model
Code Code Available 0SeG-SR: Integrating Semantic Knowledge into Remote Sensing Image Super-Resolution via Vision-Language Model May 29, 2025 Image Super-Resolution Language Modeling
Code Code Available 0DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation May 28, 2025 Autonomous Navigation RAG
— Unverified 0LiDAR Based Semantic Perception for Forklifts in Outdoor Environments May 28, 2025 Scene Understanding Segmentation
— Unverified 0