Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding Sep 5, 2024 Question Answering Scene Understanding
Code Code Available 2Can LVLMs Obtain a Driver's License? A Benchmark Towards Reliable AGI for Autonomous Driving Sep 4, 2024 Autonomous Driving Decision Making
— Unverified 0Unveiling Deep Shadows: A Survey and Benchmark on Image and Video Shadow Detection, Removal, and Generation in the Deep Learning Era Sep 3, 2024 Scene Understanding Shadow Detection
Code Code Available 2EPRecon: An Efficient Framework for Real-Time Panoptic 3D Reconstruction from Monocular Video Sep 3, 2024 3D Reconstruction Scene Understanding
Code Code Available 3GaussianPU: A Hybrid 2D-3D Upsampling Framework for Enhancing Color Point Clouds via 3D Gaussian Splatting Sep 3, 2024 3DGS GPU
— Unverified 0Leaky Wave Antenna-Equipped RF Chipless Tags for Orientation Estimation Aug 31, 2024 Scene Understanding TAG
— Unverified 0AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding Aug 30, 2024 Language Modelling Large Language Model
Code Code Available 0UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios Aug 30, 2024 Attribute geo-localization
Code Code Available 1DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving Aug 29, 2024 Autonomous Driving Denoising
— Unverified 0Str-L Pose: Integrating Point and Structured Line for Relative Pose Estimation in Dual-Graph Aug 28, 2024 Autonomous Driving Graph Neural Network
— Unverified 0RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments Aug 28, 2024 Autonomous Driving Autonomous Navigation
Code Code Available 2BOX3D: Lightweight Camera-LiDAR Fusion for 3D Object Detection and Localization Aug 27, 2024 3D Object Detection Benchmarking
— Unverified 0Interactive Occlusion Boundary Estimation through Exploitation of Synthetic Data Aug 27, 2024 Domain Adaptation Scene Understanding
— Unverified 0RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models Aug 27, 2024 Descriptive Language Modeling
Code Code Available 1Handling Geometric Domain Shifts in Semantic Segmentation of Surgical RGB and Hyperspectral Images Aug 27, 2024 Organ Segmentation Scene Segmentation
— Unverified 0MTMamba++: Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders Aug 27, 2024 Decoder Mamba
Code Code Available 1FusionSAM: Latent Space driven Segment Anything Model for Multimodal Fusion and Segmentation Aug 26, 2024 Autonomous Driving Image Segmentation
— Unverified 03D-VirtFusion: Synthetic 3D Data Augmentation through Generative Diffusion Models and Controllable Editing Aug 25, 2024 Data Augmentation Diversity
— Unverified 0Extremely Fine-Grained Visual Classification over Resembling Glyphs in the Wild Aug 25, 2024 Contrastive Learning Fine-Grained Image Classification
Code Code Available 0Making Large Language Models Better Planners with Reasoning-Decision Alignment Aug 25, 2024 Autonomous Driving Decision Making
— Unverified 0Exploring Scene Affinity for Semi-Supervised LiDAR Semantic Segmentation Aug 21, 2024 3D Semantic Segmentation Data Augmentation
Code Code Available 0OpenScan: A Benchmark for Generalized Open-Vocabulary 3D Scene Understanding Aug 20, 2024 Object Scene Understanding
Code Code Available 1Near, far: Patch-ordering enhances vision foundation models' scene understanding Aug 20, 2024 GPU Scene Understanding
— Unverified 03D-Aware Instance Segmentation and Tracking in Egocentric Videos Aug 19, 2024 3D Object Reconstruction Instance Segmentation
— Unverified 0SpectralGaussians: Semantic, spectral 3D Gaussian splatting for multi-spectral scene representation, visualization and analysis Aug 13, 2024 3DGS Scene Understanding
— Unverified 0SceneGPT: A Language Model for 3D Scene Understanding Aug 13, 2024 In-Context Learning Language Modeling
— Unverified 0HeLiMOS: A Dataset for Moving Object Segmentation in 3D Point Clouds From Heterogeneous LiDAR Sensors Aug 12, 2024 Scene Understanding Semantic Segmentation
— Unverified 0Spherical World-Locking for Audio-Visual Localization in Egocentric Videos Aug 9, 2024 Active Speaker Localization Decoder
— Unverified 0DeepInteraction++: Multi-Modality Interaction for Autonomous Driving Aug 9, 2024 3D Object Detection Autonomous Driving
Code Code Available 3Query3D: LLM-Powered Open-Vocabulary Scene Segmentation with Language Embedded 3D Gaussian Aug 7, 2024 Autonomous Driving object-detection
Code Code Available 1Complete 3d relationships extraction modality alignment network for 3d dense captioning Aug 1, 2024 3D dense captioning 3D Object Detection
— Unverified 0A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap Jul 31, 2024 Human-Object Interaction Detection Image Reconstruction
Code Code Available 0DEF-oriCORN: efficient 3D scene understanding for robust language-directed manipulation without demonstrations Jul 31, 2024 Motion Planning Scene Understanding
— Unverified 0From Feature Importance to Natural Language Explanations Using LLMs with RAG Jul 30, 2024 counterfactual Counterfactual Reasoning
Code Code Available 0Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering Jul 30, 2024 Inverse Rendering NeRF
Code Code Available 1NIS-SLAM: Neural Implicit Semantic RGB-D SLAM for 3D Consistent Scene Understanding Jul 30, 2024 Scene Understanding Simultaneous Localization and Mapping
— Unverified 0Rethinking RGB-D Fusion for Semantic Segmentation in Surgical Datasets Jul 29, 2024 Decoder Scene Understanding
— Unverified 0ASI-Seg: Audio-Driven Surgical Instrument Segmentation with Surgeon Intention Understanding Jul 28, 2024 Contrastive Learning Intention-oriented Segmentation
Code Code Available 0GP-VLS: A general-purpose vision language model for surgery Jul 27, 2024 Language Modeling Language Modelling
— Unverified 0Answerability Fields: Answerable Location Estimation via Diffusion Models Jul 26, 2024 Question Answering Scene Understanding
— Unverified 03D Question Answering for City Scene Understanding Jul 24, 2024 Autonomous Driving Question Answering
— Unverified 0Augmented Efficiency: Reducing Memory Footprint and Accelerating Inference for 3D Semantic Segmentation through Hybrid Vision Jul 23, 2024 2D Semantic Segmentation 3D Semantic Segmentation
— Unverified 0InLUT3D: Challenging real indoor dataset for point cloud analysis Jul 22, 2024 Benchmarking Scene Understanding
— Unverified 0VideoGameBunny: Towards vision assistants for video games Jul 21, 2024 Image Captioning Scene Understanding
— Unverified 0A New Lightweight Hybrid Graph Convolutional Neural Network -- CNN Scheme for Scene Classification using Object Detection Inference Jul 19, 2024 Autonomous Vehicles object-detection
Code Code Available 0GaussianBeV: 3D Gaussian Representation meets Perception Models for BeV Segmentation Jul 19, 2024 BEV Segmentation Scene Understanding
— Unverified 0MC-PanDA: Mask Confidence for Panoptic Domain Adaptation Jul 19, 2024 Domain Adaptation Panoptic Segmentation
Code Code Available 0OpenSU3D: Open World 3D Scene Understanding using Foundation Models Jul 19, 2024 Scene Understanding Spatial Reasoning
— Unverified 0Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation Jul 18, 2024 Knowledge Distillation Representation Learning
— Unverified 0Training-Free Model Merging for Multi-target Domain Adaptation Jul 18, 2024 Domain Adaptation Multi-target Domain Adaptation
— Unverified 0