| Generalized Visual Relation Detection with Diffusion Models | Apr 16, 2025 | Graph GenerationHuman-Object Interaction Detection | —Unverified | 0 |
| RADLER: Radar Object Detection Leveraging Semantic 3D City Models and Self-Supervised Radar-Image Learning | Apr 16, 2025 | Objectobject-detection | —Unverified | 0 |
| Object Placement for Anything | Apr 16, 2025 | Object | —Unverified | 0 |
| A Review of YOLOv12: Attention-Based Enhancements vs. Previous Versions | Apr 16, 2025 | Computational EfficiencyObject | —Unverified | 0 |
| GrabS: Generative Embodied Agent for 3D Object Segmentation without Scene Supervision | Apr 16, 2025 | ObjectSemantic Segmentation | CodeCode Available | 1 |
| Recent Advance in 3D Object and Scene Generation: A Survey | Apr 16, 2025 | 3D GenerationObject | —Unverified | 0 |
| DM-OSVP++: One-Shot View Planning Using 3D Diffusion Models for Active RGB-Based Object Reconstruction | Apr 16, 2025 | ObjectObject Reconstruction | —Unverified | 0 |
| Safe-Construct: Redefining Construction Safety Violation Recognition as 3D Multi-View Engagement Task | Apr 15, 2025 | 2D Object DetectionObject | —Unverified | 0 |
| Weather-Aware Object Detection Transformer for Domain Adaptation | Apr 15, 2025 | Domain AdaptationObject | —Unverified | 0 |
| 3D Object Reconstruction with mmWave Radars | Apr 15, 2025 | 3D Object Reconstruction3D Shape Reconstruction | —Unverified | 0 |
| HUMOTO: A 4D Dataset of Mocap Human Object Interactions | Apr 14, 2025 | Human-Object Interaction DetectionMotion Generation | —Unverified | 0 |
| MASSeg : 2nd Technical Report for 4th PVUW MOSE Track | Apr 14, 2025 | Data AugmentationObject | CodeCode Available | 0 |
| COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts | Apr 14, 2025 | BenchmarkingObject | —Unverified | 0 |
| DiffMOD: Progressive Diffusion Point Denoising for Moving Object Detection in Remote Sensing | Apr 14, 2025 | DenoisingDensity Estimation | —Unverified | 0 |
| MonoDiff9D: Monocular Category-Level 9D Object Pose Estimation via Diffusion Model | Apr 14, 2025 | ObjectPose Estimation | CodeCode Available | 1 |
| NTIRE 2025 Challenge on Cross-Domain Few-Shot Object Detection: Methods and Results | Apr 14, 2025 | Cross-Domain Few-ShotCross-Domain Few-Shot Object Detection | CodeCode Available | 2 |
| Multi-Object Grounding via Hierarchical Contrastive Siamese Transformers | Apr 14, 2025 | ObjectObject Localization | —Unverified | 0 |
| RICCARDO: Radar Hit Prediction and Convolution for Camera-Radar 3D Object Detection | Apr 12, 2025 | 3D Object DetectionObject | CodeCode Available | 0 |
| Cut-and-Splat: Leveraging Gaussian Splatting for Synthetic Data Generation | Apr 11, 2025 | Depth EstimationInstance Segmentation | CodeCode Available | 0 |
| Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization | Apr 11, 2025 | DenoisingObject | —Unverified | 0 |
| Digital Twin Catalog: A Large-Scale Photorealistic 3D Object Digital Twin Dataset | Apr 11, 2025 | 3D Object Reconstruction3D Reconstruction | —Unverified | 0 |
| Marmot: Multi-Agent Reasoning for Multi-Object Self-Correcting in Improving Image-Text Alignment | Apr 10, 2025 | AI AgentAttribute | —Unverified | 0 |
| Learning Object Focused Attention | Apr 10, 2025 | Inductive BiasObject | —Unverified | 0 |
| POEM: Precise Object-level Editing via MLLM control | Apr 10, 2025 | Image GenerationObject | —Unverified | 0 |
| WS-DETR: Robust Water Surface Object Detection through Vision-Radar Fusion with Detection Transformer | Apr 10, 2025 | Objectobject-detection | —Unverified | 0 |
| SAMJAM: Zero-Shot Video Scene Graph Generation for Egocentric Kitchen Videos | Apr 10, 2025 | Graph GenerationObject | —Unverified | 0 |
| BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation | Apr 10, 2025 | ObjectPose Estimation | —Unverified | 0 |
| How Can Objects Help Video-Language Understanding? | Apr 10, 2025 | Image CaptioningObject | —Unverified | 0 |
| Are We Done with Object-Centric Learning? | Apr 9, 2025 | ObjectObject Discovery | CodeCode Available | 1 |
| DLTPose: 6DoF Pose Estimation From Accurate Dense Surface Point Estimates | Apr 9, 2025 | ObjectPose Estimation | —Unverified | 0 |
| Objaverse++: Curated 3D Object Dataset with Quality Annotations | Apr 9, 2025 | 3D GenerationAttribute | CodeCode Available | 2 |
| MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking | Apr 9, 2025 | Autonomous DrivingLanguage Modeling | CodeCode Available | 0 |
| Better Decisions through the Right Causal World Model | Apr 9, 2025 | Causal InferenceModel extraction | —Unverified | 0 |
| Glossy Object Reconstruction with Cost-effective Polarized Acquisition | Apr 9, 2025 | 3D ReconstructionNovel View Synthesis | —Unverified | 0 |
| Compass Control: Multi Object Orientation Control for Text-to-Image Generation | Apr 9, 2025 | Image GenerationObject | —Unverified | 0 |
| PRIMEDrive-CoT: A Precognitive Chain-of-Thought Framework for Uncertainty-Aware Object Interaction in Driving Scene Scenario | Apr 8, 2025 | 3D Object DetectionAutonomous Driving | —Unverified | 0 |
| D-Feat Occlusions: Diffusion Features for Robustness to Partial Visual Occlusions in Object Recognition | Apr 8, 2025 | Image GenerationObject | —Unverified | 0 |
| A Self-Supervised Framework for Space Object Behaviour Characterisation | Apr 8, 2025 | Anomaly DetectionEarth Observation | —Unverified | 0 |
| InteractVLM: 3D Interaction Reasoning from 2D Foundational Models | Apr 7, 2025 | 3D ReconstructionObject | CodeCode Available | 2 |
| Grounding 3D Object Affordance with Language Instructions, Visual Observations and Interactions | Apr 7, 2025 | Object | —Unverified | 0 |
| Playing Non-Embedded Card-Based Games with Reinforcement Learning | Apr 7, 2025 | Board GamesDecision Making | CodeCode Available | 3 |
| Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting | Apr 7, 2025 | Boundary DetectionObject | CodeCode Available | 2 |
| SAM2MOT: A Novel Paradigm of Multi-Object Tracking by Segmentation | Apr 6, 2025 | Multi-Object TrackingObject | CodeCode Available | 2 |
| EMF: Event Meta Formers for Event-based Real-time Traffic Object Detection | Apr 5, 2025 | Autonomous DrivingObject | —Unverified | 0 |
| Deep Reinforcement Learning via Object-Centric Attention | Apr 3, 2025 | Deep Reinforcement LearningInductive Bias | CodeCode Available | 0 |
| CornerPoint3D: Look at the Nearest Corner Instead of the Center | Apr 3, 2025 | 3D Object DetectionObject | —Unverified | 0 |
| PicoPose: Progressive Pixel-to-Pixel Correspondence Learning for Novel Object Pose Estimation | Apr 3, 2025 | ObjectPose Estimation | CodeCode Available | 1 |
| RASP: Revisiting 3D Anamorphic Art for Shadow-Guided Packing of Irregular Objects | Apr 3, 2025 | Object | —Unverified | 0 |
| TransforMerger: Transformer-based Voice-Gesture Fusion for Robust Human-Robot Communication | Apr 2, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities | Apr 2, 2025 | DescriptiveLarge Language Model | CodeCode Available | 0 |