TADFormer : Task-Adaptive Dynamic Transformer for Efficient Multi-Task Learning Jan 8, 2025 Multi-Task Learning parameter-efficient fine-tuning
— Unverified 0Advancing the Understanding of Fine-Grained 3D Forest Structures using Digital Cousins and Simulation-to-Reality: Methods and Datasets Jan 7, 2025 Data Augmentation parameter estimation
— Unverified 0LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving Jan 7, 2025 Autonomous Driving Contrastive Learning
— Unverified 0CL3DOR: Contrastive Learning for 3D Large Multimodal Models via Odds Ratio on High-Resolution Point Clouds Jan 7, 2025 Contrastive Learning Language Modeling
— Unverified 0IAM: Enhancing RGB-D Instance Segmentation with New Benchmarks Jan 3, 2025 Data Integration Image Segmentation
Code Code Available 0VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment Jan 3, 2025 Computational Efficiency Scene Understanding
Code Code Available 23D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer Jan 2, 2025 Scene Understanding
— Unverified 0GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models Jan 2, 2025 Scene Understanding text annotation
Code Code Available 4Leverage Cross-Attention for End-to-End Open-Vocabulary Panoptic Reconstruction Jan 2, 2025 Instance Segmentation Scene Understanding
— Unverified 0GroundingFace: Fine-grained Face Understanding via Pixel Grounding Multimodal Large Language Model Jan 1, 2025 Attribute Language Modeling
— Unverified 0Vision-Language Embodiment for Monocular Depth Estimation Jan 1, 2025 3D Reconstruction Depth Estimation
— Unverified 03D-MVP: 3D Multiview Pretraining for Manipulation Jan 1, 2025 Decoder Robot Manipulation
— Unverified 0HUSH: Holistic Panoramic 3D Scene Understanding using Spherical Harmonics Jan 1, 2025 Depth Estimation Room Layout Estimation
— Unverified 0Beyond Human Perception: Understanding Multi-Object World from Monocular View Jan 1, 2025 3D visual grounding Denoising
Code Code Available 0Scene Map-based Prompt Tuning for Navigation Instruction Generation Jan 1, 2025 Scene Understanding
— Unverified 0AVQACL: A Novel Benchmark for Audio-Visual Question Answering Continual Learning Jan 1, 2025 Audio-visual Question Answering Continual Learning
Code Code Available 0All-Day Multi-Camera Multi-Target Tracking Jan 1, 2025 All Mamba
Code Code Available 1Object-aware Sound Source Localization via Audio-Visual Scene Understanding Jan 1, 2025 Scene Understanding Sound Source Localization
Code Code Available 0TADFormer: Task-Adaptive Dynamic TransFormer for Efficient Multi-Task Learning Jan 1, 2025 Multi-Task Learning parameter-efficient fine-tuning
— Unverified 0Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding Dec 31, 2024 Robot Manipulation Scene Understanding
— Unverified 0OVGaussian: Generalizable 3D Gaussian Segmentation with Open Vocabularies Dec 31, 2024 3DGS 3D Semantic Segmentation
Code Code Available 0STORM: Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes Dec 31, 2024 Dynamic Reconstruction Scene Flow Estimation
Code Code Available 3Text-to-Image GAN with Pretrained Representations Dec 30, 2024 Domain Generalization Image Generation
— Unverified 04D Gaussian Splatting: Modeling Dynamic Scenes with Native 4D Primitives Dec 30, 2024 Novel View Synthesis Scene Understanding
— Unverified 0MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic Scenarios Dec 27, 2024 Autonomous Driving Language Modeling
Code Code Available 0UniPLV: Towards Label-Efficient Open-World 3D Scene Understanding by Regional Visual Language Supervision Dec 24, 2024 Scene Understanding Semantic Segmentation
— Unverified 0Parallel Neural Computing for Scene Understanding from LiDAR Perception in Autonomous Racing Dec 24, 2024 Autonomous Driving Autonomous Racing
Code Code Available 03DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding Dec 24, 2024 Natural Language Understanding Scene Understanding
Code Code Available 2LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding Dec 23, 2024 3D Semantic Segmentation Scene Understanding
— Unverified 0Application of Multimodal Large Language Models in Autonomous Driving Dec 21, 2024 Autonomous Driving Decision Making
— Unverified 0Improving Object Detection for Time-Lapse Imagery Using Temporal Features in Wildlife Monitoring Dec 20, 2024 Object object-detection
Code Code Available 0PC-BEV: An Efficient Polar-Cartesian BEV Fusion Framework for LiDAR Semantic Segmentation Dec 19, 2024 LIDAR Semantic Segmentation Scene Understanding
Code Code Available 1ObjVariantEnsemble: Advancing Point Cloud LLM Evaluation in Challenging Scenes with Subtly Distinguished Objects Dec 19, 2024 Scene Understanding
— Unverified 0AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving Dec 19, 2024 Autonomous Driving Benchmarking
Code Code Available 2RelationField: Relate Anything in Radiance Fields Dec 18, 2024 3d scene graph generation Graph Generation
Code Code Available 2Multi-View Pedestrian Occupancy Prediction with a Novel Synthetic Dataset Dec 18, 2024 Pedestrian Detection Scene Understanding
— Unverified 0GAGS: Granularity-Aware Feature Distillation for Language Gaussian Splatting Dec 18, 2024 Scene Understanding Semantic Segmentation
— Unverified 0Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration Dec 17, 2024 audio-visual event localization audio-visual learning
Code Code Available 1An Enhanced Classification Method Based on Adaptive Multi-Scale Fusion for Long-tailed Multispectral Point Clouds Dec 16, 2024 Classification Scene Understanding
— Unverified 0DINO-Foresight: Looking into the Future with DINO Dec 16, 2024 Autonomous Driving Scene Understanding
Code Code Available 2Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning Dec 16, 2024 Hallucination Robot Manipulation
Code Code Available 2SuperGSeg: Open-Vocabulary 3D Segmentation with Structured Super-Gaussians Dec 13, 2024 GPU Object Localization
— Unverified 0WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model Dec 13, 2024 Autonomous Driving Decision Making
Code Code Available 1MAGIC: Mastering Physical Adversarial Generation in Context through Collaborative LLM Agents Dec 11, 2024 object-detection Object Detection
— Unverified 0TGOSPA Metric Parameters Selection and Evaluation for Visual Multi-object Tracking Dec 11, 2024 Multi-Object Tracking Object Tracking
— Unverified 0SLGaussian: Fast Language Gaussian Splatting in Sparse Views Dec 11, 2024 3DGS Autonomous Navigation
— Unverified 0Event fields: Capturing light fields at high speed, resolution, and dynamic range Dec 9, 2024 Depth Estimation Scene Understanding
— Unverified 0LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations Dec 9, 2024 Language Modeling Language Modelling
Code Code Available 1Visual Lexicon: Rich Image Features in Language Space Dec 9, 2024 Image Generation Image Reconstruction
— Unverified 0TB-HSU: Hierarchical 3D Scene Understanding with Contextual Affordances Dec 7, 2024 Multi-Task Learning Object
— Unverified 0