| TransforMerger: Transformer-based Voice-Gesture Fusion for Robust Human-Robot Communication | Apr 2, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| A Diffusion-Based Framework for Occluded Object Movement | Apr 2, 2025 | ObjectWorld Knowledge | —Unverified | 0 |
| v-CLR: View-Consistent Learning for Open-World Instance Segmentation | Apr 2, 2025 | Instance SegmentationObject | CodeCode Available | 1 |
| Deep LG-Track: An Enhanced Localization-Confidence-Guided Multi-Object Tracker | Apr 2, 2025 | Autonomous DrivingMulti-Object Tracking | —Unverified | 0 |
| Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities | Apr 2, 2025 | DescriptiveLarge Language Model | CodeCode Available | 0 |
| Detail-aware multi-view stereo network for depth estimation | Mar 31, 2025 | Depth EstimationImage Generation | CodeCode Available | 0 |
| MB-ORES: A Multi-Branch Object Reasoner for Visual Grounding in Remote Sensing | Mar 31, 2025 | Objectobject-detection | CodeCode Available | 0 |
| Physically Ground Commonsense Knowledge for Articulated Object Manipulation with Analytic Concepts | Mar 30, 2025 | Object | —Unverified | 0 |
| EagleVision: Object-level Attribute Multimodal LLM for Remote Sensing | Mar 30, 2025 | AttributeDisentanglement | CodeCode Available | 1 |
| Object Isolated Attention for Consistent Story Visualization | Mar 30, 2025 | ObjectStory Visualization | —Unverified | 0 |
| ReferDINO-Plus: 2nd Solution for 4th PVUW MeViS Challenge at CVPR 2025 | Mar 30, 2025 | ObjectReferring Video Object Segmentation | CodeCode Available | 0 |
| DASH: Detection and Assessment of Systematic Hallucinations of VLMs | Mar 30, 2025 | Object | CodeCode Available | 1 |
| Context in object detection: a systematic literature review | Mar 29, 2025 | Few-Shot Object DetectionObject | —Unverified | 0 |
| Efficient Explicit Joint-level Interaction Modeling with Mamba for Text-guided HOI Generation | Mar 29, 2025 | Human-Object Interaction DetectionMamba | CodeCode Available | 0 |
| Hyperspectral Adapter for Object Tracking based on Hyperspectral Video | Mar 28, 2025 | ObjectObject Tracking | —Unverified | 0 |
| The Marine Debris Forward-Looking Sonar Datasets | Mar 28, 2025 | DiversityObject | —Unverified | 0 |
| SIGHT: Single-Image Conditioned Generation of Hand Trajectories for Hand-Object Interaction | Mar 28, 2025 | Motion GenerationObject | —Unverified | 0 |
| ForcePose: A Deep Learning Approach for Force Calculation Based on Action Recognition Using MediaPipe Pose Estimation Combined with Object Detection | Mar 28, 2025 | Action RecognitionHuman-Object Interaction Detection | —Unverified | 0 |
| SemAlign3D: Semantic Correspondence between RGB-Images through Aligning 3D Object-Class Representations | Mar 28, 2025 | ObjectSemantic correspondence | —Unverified | 0 |
| Segment then Splat: A Unified Approach for 3D Open-Vocabulary Segmentation based on Gaussian Splatting | Mar 28, 2025 | 3D Object RetrievalObject | —Unverified | 0 |
| TranSplat: Lighting-Consistent Cross-Scene Object Transfer with 3D Gaussian Splatting | Mar 28, 2025 | Object | —Unverified | 0 |
| VisTa: Visual-contextual and Text-augmented Zero-shot Object-level OOD Detection | Mar 28, 2025 | ObjectOut of Distribution (OOD) Detection | —Unverified | 0 |
| RUNA: Object-level Out-of-Distribution Detection via Regional Uncertainty Alignment of Multimodal Representations | Mar 28, 2025 | ObjectOut-of-Distribution Detection | —Unverified | 0 |
| AGILE: A Diffusion-Based Attention-Guided Image and Label Translation for Efficient Cross-Domain Plant Trait Identification | Mar 27, 2025 | DenoisingObject | CodeCode Available | 0 |
| BOOTPLACE: Bootstrapped Object Placement with Detection Transformers | Mar 27, 2025 | Data AugmentationObject | CodeCode Available | 1 |
| Visual Jenga: Discovering Object Dependencies via Counterfactual Inpainting | Mar 27, 2025 | counterfactualObject | —Unverified | 0 |
| Learning Class Prototypes for Unified Sparse Supervised 3D Object Detection | Mar 27, 2025 | 3D Object DetectionObject | CodeCode Available | 1 |
| CTRL-O: Language-Controllable Object-Centric Visual Representation Learning | Mar 27, 2025 | Image GenerationObject | —Unverified | 0 |
| RelTriple: Learning Plausible Indoor Layouts by Integrating Relationship Triples into the Diffusion Process | Mar 26, 2025 | Collision AvoidanceLayout Generation | —Unverified | 0 |
| GLRD: Global-Local Collaborative Reason and Debate with PSL for 3D Open-Vocabulary Detection | Mar 26, 2025 | Common Sense ReasoningObject | —Unverified | 0 |
| Incremental Object Keypoint Learning | Mar 26, 2025 | Keypoint EstimationObject | —Unverified | 0 |
| Guiding Human-Object Interactions with Rich Geometry and Relations | Mar 26, 2025 | Human-Object Interaction DetectionMotion Generation | —Unverified | 0 |
| LayerCraft: Enhancing Text-to-Image Generation with CoT Reasoning and Layered Object Integration | Mar 25, 2025 | Image GenerationObject | CodeCode Available | 0 |
| DynOPETs: A Versatile Benchmark for Dynamic Object Pose Estimation and Tracking in Moving Camera Scenarios | Mar 25, 2025 | 3D Object DetectionObject | CodeCode Available | 1 |
| Learning 3D Object Spatial Relationships from Pre-trained 2D Diffusion Models | Mar 25, 2025 | Object | —Unverified | 0 |
| COB-GS: Clear Object Boundaries in 3DGS Segmentation Based on Boundary-Adaptive Gaussian Splitting | Mar 25, 2025 | 3DGSObject | CodeCode Available | 2 |
| CamSAM2: Segment Anything Accurately in Camouflaged Videos | Mar 25, 2025 | Camouflaged Object SegmentationObject | CodeCode Available | 1 |
| Visuo-Tactile Object Pose Estimation for a Multi-Finger Robot Hand with Low-Resolution In-Hand Tactile Sensing | Mar 25, 2025 | 3D Pose EstimationObject | —Unverified | 0 |
| Beyond Object Categories: Multi-Attribute Reference Understanding for Visual Grounding | Mar 25, 2025 | AttributeObject | —Unverified | 0 |
| Zero-Shot Human-Object Interaction Synthesis with Multimodal Priors | Mar 25, 2025 | DiversityHuman-Object Interaction Detection | —Unverified | 0 |
| Benchmarking Object Detectors under Real-World Distribution Shifts in Satellite Imagery | Mar 24, 2025 | BenchmarkingHumanitarian | CodeCode Available | 1 |
| Human-Object Interaction with Vision-Language Model Guided Relative Movement Dynamics | Mar 24, 2025 | Human-Object Interaction DetectionLanguage Modeling | —Unverified | 0 |
| Global-Local Tree Search in VLMs for 3D Indoor Scene Generation | Mar 24, 2025 | Common Sense ReasoningObject | CodeCode Available | 1 |
| CQ-DINO: Mitigating Gradient Dilution via Category Queries for Vast Vocabulary Object Detection | Mar 24, 2025 | Objectobject-detection | CodeCode Available | 0 |
| Online 3D Scene Reconstruction Using Neural Object Priors | Mar 24, 2025 | 3D Scene ReconstructionObject | —Unverified | 0 |
| Any6D: Model-free 6D Pose Estimation of Novel Objects | Mar 24, 2025 | 6D Pose Estimation6D Pose Estimation using RGB | —Unverified | 0 |
| An Image-like Diffusion Method for Human-Object Interaction Detection | Mar 23, 2025 | Human-Object Interaction DetectionImage Generation | —Unverified | 0 |
| Decorum: A Language-Based Approach For Style-Conditioned Synthesis of Indoor 3D Scenes | Mar 23, 2025 | ObjectRetrieval | —Unverified | 0 |
| OmnimatteZero: Training-free Real-time Omnimatte with Pre-trained Video Diffusion Models | Mar 23, 2025 | Image InpaintingObject | —Unverified | 0 |
| Shapley-Scarf Markets with Objective Indifferences | Mar 23, 2025 | Object | —Unverified | 0 |