Towards Precise 3D Human Pose Estimation with Multi-Perspective Spatial-Temporal Relational Transformers Jan 30, 2024 3D Human Pose Estimation Pose Estimation
Code Code Available 1Digital Divides in Scene Recognition: Uncovering Socioeconomic Biases in Deep Learning Systems Jan 23, 2024 Scene Classification Scene Recognition
— Unverified 0AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents Jan 23, 2024 Instruction Following Scene Understanding
— Unverified 0UniM-OV3D: Uni-Modality Open-Vocabulary 3D Scene Understanding with Fine-Grained Feature Representation Jan 21, 2024 Instance Segmentation Scene Understanding
Code Code Available 1S^3M-Net: Joint Learning of Semantic Segmentation and Stereo Matching for Autonomous Driving Jan 21, 2024 Autonomous Driving Scene Understanding
— Unverified 0Pixel-Wise Recognition for Holistic Surgical Scene Understanding Jan 20, 2024 Scene Understanding Segmentation
Code Code Available 2BPDO:Boundary Points Dynamic Optimization for Arbitrary Shape Scene Text Detection Jan 18, 2024 Diversity Scene Text Detection
— Unverified 0ICGNet: A Unified Approach for Instance-Centric Grasping Jan 18, 2024 Object Object Reconstruction
Code Code Available 0GARField: Group Anything with Radiance Fields Jan 17, 2024 Scene Understanding
Code Code Available 3SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding Jan 17, 2024 3D visual grounding Scene Understanding
— Unverified 0RSUD20K: A Dataset for Road Scene Understanding In Autonomous Driving Jan 14, 2024 Autonomous Driving Benchmarking
Code Code Available 1Class-Imbalanced Semi-Supervised Learning for Large-Scale Point Cloud Semantic Segmentation via Decoupling Optimization Jan 13, 2024 Pseudo Label Representation Learning
— Unverified 0Learning Segmented 3D Gaussians via Efficient Feature Unprojection for Zero-shot Neural Scene Segmentation Jan 11, 2024 Decoder Panoptic Segmentation
— Unverified 0Exploring Self- and Cross-Triplet Correlations for Human-Object Interaction Detection Jan 11, 2024 Human-Object Interaction Detection Knowledge Distillation
— Unverified 0VLP: Vision Language Planning for Autonomous Driving Jan 10, 2024 Autonomous Driving Motion Planning
— Unverified 0FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild Jan 8, 2024 Language Modelling Large Language Model
Code Code Available 03DMIT: 3D Multi-modal Instruction Tuning for Scene Understanding Jan 6, 2024 Scene Understanding Visual Question Answering (VQA)
Code Code Available 1FMGS: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D Scene Understanding Jan 3, 2024 object-detection Object Detection
— Unverified 0Unsupervised 3D Structure Inference from Category-Specific Image Collections Jan 1, 2024 Graph Matching Object
— Unverified 0When Visual Grounding Meets Gigapixel-level Large-scale Scenes: Benchmark and Approach Jan 1, 2024 Scene Understanding Visual Grounding
— Unverified 0Bilateral Adaptation for Human-Object Interaction Detection with Occlusion-Robustness Jan 1, 2024 Human-Object Interaction Detection object-detection
— Unverified 0PanoRecon: Real-Time Panoptic 3D Reconstruction from Monocular Video Jan 1, 2024 3D Panoptic Segmentation 3D Reconstruction
Code Code Available 0Towards CLIP-driven Language-free 3D Visual Grounding via 2D-3D Relational Enhancement and Consistency Jan 1, 2024 3D visual grounding Relation
Code Code Available 0MAPLM: A Real-World Large-Scale Vision-Language Benchmark for Map and Traffic Scene Understanding Jan 1, 2024 Autonomous Driving Instruction Following
Code Code Available 2SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes Jan 1, 2024 Instance Segmentation Motion Estimation
— Unverified 0Going Beyond Multi-Task Dense Prediction with Synergy Embedding Models Jan 1, 2024 Scene Understanding
— Unverified 0Omni-Q: Omni-Directional Scene Understanding for Unsupervised Visual Grounding Jan 1, 2024 Scene Understanding Visual Grounding
— Unverified 0Robust Multi-Modal Image Stitching for Improved Scene Understanding Dec 28, 2023 Image Stitching Scene Understanding
— Unverified 0Cloud-Device Collaborative Learning for Multimodal Large Language Models Dec 26, 2023 Device-Cloud Collaboration Knowledge Distillation
— Unverified 0EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI Dec 26, 2023 Scene Understanding
Code Code Available 2DI-V2X: Learning Domain-Invariant Representation for Vehicle-Infrastructure Collaborative 3D Object Detection Dec 25, 2023 3D Object Detection object-detection
Code Code Available 1WildScenes: A Benchmark for 2D and 3D Semantic Segmentation in Large-scale Natural Environments Dec 23, 2023 3D Semantic Segmentation Domain Adaptation
Code Code Available 1Pola4All: survey of polarimetric applications and an open-source toolkit to analyze polarization Dec 22, 2023 3D Reconstruction Depth Estimation
Code Code Available 1BridgeNet: Comprehensive and Effective Feature Interactions via Bridge Feature for Multi-task Dense Predictions Dec 21, 2023 Decoder Multi-Task Learning
— Unverified 0Object Attribute Matters in Visual Question Answering Dec 20, 2023 Attribute Graph Neural Network
Code Code Available 0AccidentGPT: Accident Analysis and Prevention from V2X Environmental Perception with Multi-modal Large Model Dec 20, 2023 Autonomous Driving Scene Understanding
— Unverified 0Language-Assisted 3D Scene Understanding Dec 18, 2023 3D Object Detection 3D Semantic Segmentation
— Unverified 0Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance Dec 17, 2023 3D Instance Segmentation 3D Open-Vocabulary Instance Segmentation
Code Code Available 1Simple Image-level Classification Improves Open-vocabulary Object Detection Dec 16, 2023 Knowledge Distillation Object
Code Code Available 1Transformers in Unsupervised Structure-from-Motion Dec 16, 2023 Decision Making image-classification
Code Code Available 1Weakly-Supervised 3D Visual Grounding based on Visual Linguistic Alignment Dec 15, 2023 3D visual grounding Natural Language Queries
— Unverified 0VMT-Adapter: Parameter-Efficient Transfer Learning for Multi-Task Dense Scene Understanding Dec 14, 2023 Scene Understanding Transfer Learning
— Unverified 0Dietary Assessment with Multimodal ChatGPT: A Systematic Analysis Dec 14, 2023 Image Captioning Scene Understanding
— Unverified 0Zoom in on the Plant: Fine-grained Analysis of Leaf, Stem and Vein Instances Dec 14, 2023 Scene Understanding
Code Code Available 0Living Scenes: Multi-object Relocalization and Reconstruction in Changing 3D Environments Dec 14, 2023 3D Reconstruction Decoder
Code Code Available 1Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers Dec 13, 2023 3D Question Answering (3D-QA) Attribute
Code Code Available 2X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-modal Knowledge Transfer Dec 12, 2023 Action Recognition Action Segmentation
Code Code Available 0Spatiotemporal Event Graphs for Dynamic Scene Understanding Dec 11, 2023 Action Detection Activity Detection
— Unverified 0SkyScenes: A Synthetic Dataset for Aerial Scene Understanding Dec 11, 2023 Diversity Scene Understanding
— Unverified 0Cataract-1K: Cataract Surgery Dataset for Scene Segmentation, Phase Recognition, and Irregularity Detection Dec 11, 2023 Benchmarking Domain Adaptation
— Unverified 0