| Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge | Mar 26, 2024 | ObjectSound Source Localization | CodeCode Available | 1 |
| Comp4D: LLM-Guided Compositional 4D Scene Generation | Mar 25, 2024 | ObjectScene Generation | —Unverified | 0 |
| Exploiting Priors from 3D Diffusion Models for RGB-Based One-Shot View Planning | Mar 25, 2024 | 3D GenerationObject | CodeCode Available | 0 |
| Co-Occurring of Object Detection and Identification towards unlabeled object discovery | Mar 25, 2024 | Objectobject-detection | —Unverified | 0 |
| ASDF: Assembly State Detection Utilizing Late Fusion by Integrating 6D Pose Estimation | Mar 25, 2024 | 6D Pose EstimationObject | CodeCode Available | 0 |
| RCBEVDet: Radar-camera Fusion in Bird's Eye View for 3D Object Detection | Mar 25, 2024 | 3D Object Detection3D Object Detection (RoI) | CodeCode Available | 3 |
| Elysium: Exploring Object-level Perception in Videos via MLLM | Mar 25, 2024 | ObjectObject Tracking | CodeCode Available | 2 |
| V2X-PC: Vehicle-to-everything Collaborative Perception via Point Cluster | Mar 25, 2024 | Object | —Unverified | 0 |
| Data-Efficient 3D Visual Grounding via Order-Aware Referring | Mar 25, 2024 | 3D visual groundingObject | —Unverified | 0 |
| Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects | Mar 25, 2024 | Action RecognitionMotion Generation | CodeCode Available | 3 |
| Multiple Object Tracking as ID Prediction | Mar 25, 2024 | Multi-Object TrackingMultiple Object Tracking | CodeCode Available | 3 |
| DOCTR: Disentangled Object-Centric Transformer for Point Scene Understanding | Mar 25, 2024 | DecoderObject | CodeCode Available | 0 |
| Toward Open-Set Human Object Interaction Detection | Mar 24, 2024 | Contrastive LearningHuman-Object Interaction Detection | CodeCode Available | 0 |
| Cross-domain Multi-modal Few-shot Object Detection via Rich Text | Mar 24, 2024 | Cross-Domain Few-ShotDomain Adaptation | CodeCode Available | 0 |
| Realtime Robust Shape Estimation of Deformable Linear Object | Mar 24, 2024 | ObjectUnity | —Unverified | 0 |
| Fusion of Active and Passive Measurements for Robust and Scalable Positioning | Mar 24, 2024 | Object | —Unverified | 0 |
| Inverse Rendering of Glossy Objects via the Neural Plenoptic Function and Radiance Fields | Mar 24, 2024 | Inverse RenderingNeRF | —Unverified | 0 |
| Object Detectors in the Open Environment: Challenges, Solutions, and Outlook | Mar 24, 2024 | Incremental LearningObject | CodeCode Available | 1 |
| Gaze-guided Hand-Object Interaction Synthesis: Dataset and Method | Mar 24, 2024 | DenoisingHuman motion prediction | —Unverified | 0 |
| Towards Two-Stream Foveation-based Active Vision Learning | Mar 24, 2024 | FoveationObject | —Unverified | 0 |
| Temporal-Spatial Object Relations Modeling for Vision-and-Language Navigation | Mar 23, 2024 | NavigateObject | —Unverified | 0 |
| SUP-NeRF: A Streamlined Unification of Pose Estimation and NeRF for Monocular 3D Object Reconstruction | Mar 23, 2024 | 3D Object Reconstruction3D Reconstruction | CodeCode Available | 1 |
| PNAS-MOT: Multi-Modal Object Tracking with Pareto Neural Architecture Search | Mar 23, 2024 | Autonomous DrivingMultiple Object Tracking | CodeCode Available | 1 |
| Inpainting-Driven Mask Optimization for Object Removal | Mar 23, 2024 | Image InpaintingObject | —Unverified | 0 |
| InterFusion: Text-Driven Generation of 3D Human-Object Interaction | Mar 22, 2024 | 3D Generationglobal-optimization | CodeCode Available | 2 |
| Reasoning-Enhanced Object-Centric Learning for Videos | Mar 22, 2024 | ObjectObject Tracking | —Unverified | 0 |
| SFOD: Spiking Fusion Object Detector | Mar 22, 2024 | Objectobject-detection | CodeCode Available | 1 |
| VRSO: Visual-Centric Reconstruction for Static Object Annotation | Mar 22, 2024 | Objectobject-detection | CodeCode Available | 1 |
| Pose-Aware Self-Supervised Learning with Viewpoint Trajectory Regularization | Mar 22, 2024 | ObjectPose Estimation | CodeCode Available | 0 |
| Survey on Modeling of Human-made Articulated Objects | Mar 22, 2024 | ObjectSurvey | —Unverified | 0 |
| PseudoTouch: Efficiently Imaging the Surface Feel of Objects for Robotic Manipulation | Mar 22, 2024 | ObjectObject Recognition | —Unverified | 0 |
| Zero-Shot Multi-Object Scene Completion | Mar 21, 2024 | Object | —Unverified | 0 |
| External Knowledge Enhanced 3D Scene Generation from Sketch | Mar 21, 2024 | DenoisingObject | —Unverified | 0 |
| VAPO: Visibility-Aware Keypoint Localization for Efficient 6DoF Object Pose Estimation | Mar 21, 2024 | ObjectPose Estimation | —Unverified | 0 |
| Leveraging Large Language Model-based Room-Object Relationships Knowledge for Enhancing Multimodal-Input Object Goal Navigation | Mar 21, 2024 | Common Sense ReasoningLanguage Modeling | —Unverified | 0 |
| Robust 3D Shape Reconstruction in Zero-Shot from a Single Image in the Wild | Mar 21, 2024 | 3D Shape ReconstructionObject | —Unverified | 0 |
| T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy | Mar 21, 2024 | Contrastive LearningDescriptive | CodeCode Available | 7 |
| Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection | Mar 21, 2024 | DecoderObject | —Unverified | 0 |
| 3D Object Detection from Point Cloud via Voting Step Diffusion | Mar 21, 2024 | 3D Object DetectionObject | CodeCode Available | 0 |
| EC-IoU: Orienting Safety for Object Detectors via Ego-Centric Intersection-over-Union | Mar 20, 2024 | Autonomous DrivingObject | —Unverified | 0 |
| DVMNet++: Rethinking Relative Pose Estimation for Unseen Objects | Mar 20, 2024 | Natural Language UnderstandingObject | CodeCode Available | 1 |
| MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining | Mar 20, 2024 | Aerial Scene ClassificationBuilding change detection for remote sensing images | CodeCode Available | 3 |
| Few-shot Oriented Object Detection with Memorable Contrastive Learning in Remote Sensing Images | Mar 20, 2024 | Contrastive LearningFew-Shot Object Detection | —Unverified | 0 |
| EcoSense: Energy-Efficient Intelligent Sensing for In-Shore Ship Detection through Edge-Cloud Collaboration | Mar 20, 2024 | ClassificationObject | —Unverified | 0 |
| 3D Semantic MapNet: Building Maps for Multi-Object Re-Identification in 3D | Mar 19, 2024 | Object | —Unverified | 0 |
| SC-Diff: 3D Shape Completion with Latent Diffusion Models | Mar 19, 2024 | Object | —Unverified | 0 |
| OV9D: Open-Vocabulary Category-Level 9D Object Pose and Size Estimation | Mar 19, 2024 | Object | —Unverified | 0 |
| ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance | Mar 19, 2024 | 3D GenerationObject | —Unverified | 0 |
| Instance-Warp: Saliency Guided Image Warping for Unsupervised Domain Adaptation | Mar 19, 2024 | Domain AdaptationObject | CodeCode Available | 0 |
| DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM | Mar 19, 2024 | Objectobject-detection | CodeCode Available | 1 |