| Marmot: Multi-Agent Reasoning for Multi-Object Self-Correcting in Improving Image-Text Alignment | Apr 10, 2025 | AI AgentAttribute | —Unverified | 0 |
| BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation | Apr 10, 2025 | ObjectPose Estimation | —Unverified | 0 |
| POEM: Precise Object-level Editing via MLLM control | Apr 10, 2025 | Image GenerationObject | —Unverified | 0 |
| Glossy Object Reconstruction with Cost-effective Polarized Acquisition | Apr 9, 2025 | 3D ReconstructionNovel View Synthesis | —Unverified | 0 |
| Compass Control: Multi Object Orientation Control for Text-to-Image Generation | Apr 9, 2025 | Image GenerationObject | —Unverified | 0 |
| MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking | Apr 9, 2025 | Autonomous DrivingLanguage Modeling | CodeCode Available | 0 |
| DLTPose: 6DoF Pose Estimation From Accurate Dense Surface Point Estimates | Apr 9, 2025 | ObjectPose Estimation | —Unverified | 0 |
| Better Decisions through the Right Causal World Model | Apr 9, 2025 | Causal InferenceModel extraction | —Unverified | 0 |
| A Self-Supervised Framework for Space Object Behaviour Characterisation | Apr 8, 2025 | Anomaly DetectionEarth Observation | —Unverified | 0 |
| PRIMEDrive-CoT: A Precognitive Chain-of-Thought Framework for Uncertainty-Aware Object Interaction in Driving Scene Scenario | Apr 8, 2025 | 3D Object DetectionAutonomous Driving | —Unverified | 0 |
| D-Feat Occlusions: Diffusion Features for Robustness to Partial Visual Occlusions in Object Recognition | Apr 8, 2025 | Image GenerationObject | —Unverified | 0 |
| Grounding 3D Object Affordance with Language Instructions, Visual Observations and Interactions | Apr 7, 2025 | Object | —Unverified | 0 |
| EMF: Event Meta Formers for Event-based Real-time Traffic Object Detection | Apr 5, 2025 | Autonomous DrivingObject | —Unverified | 0 |
| CornerPoint3D: Look at the Nearest Corner Instead of the Center | Apr 3, 2025 | 3D Object DetectionObject | —Unverified | 0 |
| RASP: Revisiting 3D Anamorphic Art for Shadow-Guided Packing of Irregular Objects | Apr 3, 2025 | Object | —Unverified | 0 |
| Deep Reinforcement Learning via Object-Centric Attention | Apr 3, 2025 | Deep Reinforcement LearningInductive Bias | CodeCode Available | 0 |
| COST: Contrastive One-Stage Transformer for Vision-Language Small Object Tracking | Apr 2, 2025 | cross-modal alignmentObject | —Unverified | 0 |
| TransforMerger: Transformer-based Voice-Gesture Fusion for Robust Human-Robot Communication | Apr 2, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| A Diffusion-Based Framework for Occluded Object Movement | Apr 2, 2025 | ObjectWorld Knowledge | —Unverified | 0 |
| Slot-Level Robotic Placement via Visual Imitation from Single Human Video | Apr 2, 2025 | Object | —Unverified | 0 |
| Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities | Apr 2, 2025 | DescriptiveLarge Language Model | CodeCode Available | 0 |
| Deep LG-Track: An Enhanced Localization-Confidence-Guided Multi-Object Tracker | Apr 2, 2025 | Autonomous DrivingMulti-Object Tracking | —Unverified | 0 |
| MB-ORES: A Multi-Branch Object Reasoner for Visual Grounding in Remote Sensing | Mar 31, 2025 | Objectobject-detection | CodeCode Available | 0 |
| Detail-aware multi-view stereo network for depth estimation | Mar 31, 2025 | Depth EstimationImage Generation | CodeCode Available | 0 |
| Physically Ground Commonsense Knowledge for Articulated Object Manipulation with Analytic Concepts | Mar 30, 2025 | Object | —Unverified | 0 |
| Object Isolated Attention for Consistent Story Visualization | Mar 30, 2025 | ObjectStory Visualization | —Unverified | 0 |
| ReferDINO-Plus: 2nd Solution for 4th PVUW MeViS Challenge at CVPR 2025 | Mar 30, 2025 | ObjectReferring Video Object Segmentation | CodeCode Available | 0 |
| Context in object detection: a systematic literature review | Mar 29, 2025 | Few-Shot Object DetectionObject | —Unverified | 0 |
| Efficient Explicit Joint-level Interaction Modeling with Mamba for Text-guided HOI Generation | Mar 29, 2025 | Human-Object Interaction DetectionMamba | CodeCode Available | 0 |
| Segment then Splat: A Unified Approach for 3D Open-Vocabulary Segmentation based on Gaussian Splatting | Mar 28, 2025 | 3D Object RetrievalObject | —Unverified | 0 |
| RUNA: Object-level Out-of-Distribution Detection via Regional Uncertainty Alignment of Multimodal Representations | Mar 28, 2025 | ObjectOut-of-Distribution Detection | —Unverified | 0 |
| SIGHT: Single-Image Conditioned Generation of Hand Trajectories for Hand-Object Interaction | Mar 28, 2025 | Motion GenerationObject | —Unverified | 0 |
| VisTa: Visual-contextual and Text-augmented Zero-shot Object-level OOD Detection | Mar 28, 2025 | ObjectOut of Distribution (OOD) Detection | —Unverified | 0 |
| ForcePose: A Deep Learning Approach for Force Calculation Based on Action Recognition Using MediaPipe Pose Estimation Combined with Object Detection | Mar 28, 2025 | Action RecognitionHuman-Object Interaction Detection | —Unverified | 0 |
| SemAlign3D: Semantic Correspondence between RGB-Images through Aligning 3D Object-Class Representations | Mar 28, 2025 | ObjectSemantic correspondence | —Unverified | 0 |
| TranSplat: Lighting-Consistent Cross-Scene Object Transfer with 3D Gaussian Splatting | Mar 28, 2025 | Object | —Unverified | 0 |
| Hyperspectral Adapter for Object Tracking based on Hyperspectral Video | Mar 28, 2025 | ObjectObject Tracking | —Unverified | 0 |
| The Marine Debris Forward-Looking Sonar Datasets | Mar 28, 2025 | DiversityObject | —Unverified | 0 |
| CTRL-O: Language-Controllable Object-Centric Visual Representation Learning | Mar 27, 2025 | Image GenerationObject | —Unverified | 0 |
| AGILE: A Diffusion-Based Attention-Guided Image and Label Translation for Efficient Cross-Domain Plant Trait Identification | Mar 27, 2025 | DenoisingObject | CodeCode Available | 0 |
| Visual Jenga: Discovering Object Dependencies via Counterfactual Inpainting | Mar 27, 2025 | counterfactualObject | —Unverified | 0 |
| RelTriple: Learning Plausible Indoor Layouts by Integrating Relationship Triples into the Diffusion Process | Mar 26, 2025 | Collision AvoidanceLayout Generation | —Unverified | 0 |
| Guiding Human-Object Interactions with Rich Geometry and Relations | Mar 26, 2025 | Human-Object Interaction DetectionMotion Generation | —Unverified | 0 |
| Incremental Object Keypoint Learning | Mar 26, 2025 | Keypoint EstimationObject | —Unverified | 0 |
| GLRD: Global-Local Collaborative Reason and Debate with PSL for 3D Open-Vocabulary Detection | Mar 26, 2025 | Common Sense ReasoningObject | —Unverified | 0 |
| Zero-Shot Human-Object Interaction Synthesis with Multimodal Priors | Mar 25, 2025 | DiversityHuman-Object Interaction Detection | —Unverified | 0 |
| Learning 3D Object Spatial Relationships from Pre-trained 2D Diffusion Models | Mar 25, 2025 | Object | —Unverified | 0 |
| LayerCraft: Enhancing Text-to-Image Generation with CoT Reasoning and Layered Object Integration | Mar 25, 2025 | Image GenerationObject | CodeCode Available | 0 |
| Visuo-Tactile Object Pose Estimation for a Multi-Finger Robot Hand with Low-Resolution In-Hand Tactile Sensing | Mar 25, 2025 | 3D Pose EstimationObject | —Unverified | 0 |
| Beyond Object Categories: Multi-Attribute Reference Understanding for Visual Grounding | Mar 25, 2025 | AttributeObject | —Unverified | 0 |