| M^3-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation | Jun 15, 2025 | ObjectSemantic Segmentation | CodeCode Available | 1 |
| Multiple Object Stitching for Unsupervised Representation Learning | Jun 9, 2025 | Contrastive LearningObject | CodeCode Available | 1 |
| LPOI: Listwise Preference Optimization for Vision Language Models | May 27, 2025 | Object | CodeCode Available | 1 |
| Locality-Aware Zero-Shot Human-Object Interaction Detection | May 26, 2025 | Human-Object Interaction DetectionObject | CodeCode Available | 1 |
| ReaMOT: A Benchmark and Framework for Reasoning-based Multi-Object Tracking | May 26, 2025 | Multi-Object TrackingObject | CodeCode Available | 1 |
| Object-level Cross-view Geo-localization with Location Enhancement and Multi-Head Cross Attention | May 23, 2025 | Few-Shot Learninggeo-localization | CodeCode Available | 1 |
| StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation | May 15, 2025 | Face RecognitionObject | CodeCode Available | 1 |
| Asynchronous Multi-Object Tracking with an Event Camera | May 12, 2025 | Multi-Object TrackingObject | CodeCode Available | 1 |
| A Simple Detector with Frame Dynamics is a Strong Tracker | May 8, 2025 | Objectobject-detection | CodeCode Available | 1 |
| CDFormer: Cross-Domain Few-Shot Object Detection Transformer Against Feature Confusion | May 2, 2025 | Cross-Domain Few-ShotCross-Domain Few-Shot Object Detection | CodeCode Available | 1 |
| LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics | Apr 30, 2025 | In-Context LearningObject | CodeCode Available | 1 |
| GrabS: Generative Embodied Agent for 3D Object Segmentation without Scene Supervision | Apr 16, 2025 | ObjectSemantic Segmentation | CodeCode Available | 1 |
| MonoDiff9D: Monocular Category-Level 9D Object Pose Estimation via Diffusion Model | Apr 14, 2025 | ObjectPose Estimation | CodeCode Available | 1 |
| Are We Done with Object-Centric Learning? | Apr 9, 2025 | ObjectObject Discovery | CodeCode Available | 1 |
| PicoPose: Progressive Pixel-to-Pixel Correspondence Learning for Novel Object Pose Estimation | Apr 3, 2025 | ObjectPose Estimation | CodeCode Available | 1 |
| v-CLR: View-Consistent Learning for Open-World Instance Segmentation | Apr 2, 2025 | Instance SegmentationObject | CodeCode Available | 1 |
| DASH: Detection and Assessment of Systematic Hallucinations of VLMs | Mar 30, 2025 | Object | CodeCode Available | 1 |
| EagleVision: Object-level Attribute Multimodal LLM for Remote Sensing | Mar 30, 2025 | AttributeDisentanglement | CodeCode Available | 1 |
| BOOTPLACE: Bootstrapped Object Placement with Detection Transformers | Mar 27, 2025 | Data AugmentationObject | CodeCode Available | 1 |
| Learning Class Prototypes for Unified Sparse Supervised 3D Object Detection | Mar 27, 2025 | 3D Object DetectionObject | CodeCode Available | 1 |
| DynOPETs: A Versatile Benchmark for Dynamic Object Pose Estimation and Tracking in Moving Camera Scenarios | Mar 25, 2025 | 3D Object DetectionObject | CodeCode Available | 1 |
| CamSAM2: Segment Anything Accurately in Camouflaged Videos | Mar 25, 2025 | Camouflaged Object SegmentationObject | CodeCode Available | 1 |
| Benchmarking Object Detectors under Real-World Distribution Shifts in Satellite Imagery | Mar 24, 2025 | BenchmarkingHumanitarian | CodeCode Available | 1 |
| Global-Local Tree Search in VLMs for 3D Indoor Scene Generation | Mar 24, 2025 | Common Sense ReasoningObject | CodeCode Available | 1 |
| GOAL: Global-local Object Alignment Learning | Mar 22, 2025 | DescriptiveObject | CodeCode Available | 1 |
| UltraFlwr -- An Efficient Federated Medical and Surgical Object Detection Framework | Mar 19, 2025 | Federated LearningObject | CodeCode Available | 1 |
| GIVEPose: Gradual Intra-class Variation Elimination for RGB-based Category-Level Object Pose Estimation | Mar 19, 2025 | ObjectPose Estimation | CodeCode Available | 1 |
| MMR: A Large-scale Benchmark Dataset for Multi-target and Multi-granularity Reasoning Segmentation | Mar 18, 2025 | ObjectReasoning Segmentation | CodeCode Available | 1 |
| Robust Object Detection of Underwater Robot based on Domain Generalization | Mar 18, 2025 | Domain GeneralizationObject | CodeCode Available | 1 |
| History-Aware Transformation of ReID Features for Multiple Object Tracking | Mar 16, 2025 | Multi-Object TrackingMultiple Object Tracking | CodeCode Available | 1 |
| OmniSTVG: Toward Spatio-Temporal Omni-Object Video Grounding | Mar 13, 2025 | ObjectVideo Grounding | CodeCode Available | 1 |
| Learning to Detect Objects from Multi-Agent LiDAR Scans without Manual Labels | Mar 11, 2025 | 3D Object DetectionObject | CodeCode Available | 1 |
| SimROD: A Simple Baseline for Raw Object Detection with Global and Local Enhancements | Mar 10, 2025 | Objectobject-detection | CodeCode Available | 1 |
| A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning | Mar 10, 2025 | ObjectScene Understanding | CodeCode Available | 1 |
| DQO-MAP: Dual Quadrics Multi-Object mapping with Gaussian Splatting | Mar 4, 2025 | Computational EfficiencyCPU | CodeCode Available | 1 |
| Convex Hull-based Algebraic Constraint for Visual Quadric SLAM | Mar 3, 2025 | ObjectObject Reconstruction | CodeCode Available | 1 |
| Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning | Mar 2, 2025 | Large Language ModelMulti-Instance Retrieval | CodeCode Available | 1 |
| Mitigating Hallucinations in Large Vision-Language Models by Adaptively Constraining Information Flow | Feb 28, 2025 | HallucinationObject | CodeCode Available | 1 |
| Dynamic Markov Blanket Detection for Macroscopic Physics Discovery | Feb 28, 2025 | Object | CodeCode Available | 1 |
| C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation | Feb 27, 2025 | ObjectVideo Generation | CodeCode Available | 1 |
| CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation | Feb 27, 2025 | Image-text matchingObject | CodeCode Available | 1 |
| Vector-Quantized Vision Foundation Models for Object-Centric Learning | Feb 27, 2025 | ObjectObject Discovery | CodeCode Available | 1 |
| Ev-3DOD: Pushing the Temporal Boundaries of 3D Object Detection with Event Cameras | Feb 26, 2025 | 3D Object DetectionAutonomous Driving | CodeCode Available | 1 |
| Cross-domain Few-shot Object Detection with Multi-modal Textual Enrichment | Feb 23, 2025 | Cross-Domain Few-ShotCross-Domain Few-Shot Object Detection | CodeCode Available | 1 |
| Object-Centric Image to Video Generation with Language Guidance | Feb 17, 2025 | Image to Video GenerationObject | CodeCode Available | 1 |
| DA-Mamba: Domain Adaptive Hybrid Mamba-Transformer Based One-Stage Object Detection | Feb 16, 2025 | Domain AdaptationKnowledge Distillation | CodeCode Available | 1 |
| Knowing Your Target: Target-Aware Transformer Makes Better Spatio-Temporal Video Grounding | Feb 16, 2025 | AttributeObject | CodeCode Available | 1 |
| PlaySlot: Learning Inverse Latent Dynamics for Controllable Object-Centric Video Prediction and Planning | Feb 11, 2025 | ObjectVideo Prediction | CodeCode Available | 1 |
| SAVE: Self-Attention on Visual Embedding for Zero-Shot Generic Object Counting | Feb 10, 2025 | Exemplar-Free CountingObject | CodeCode Available | 1 |
| TUMTraffic-VideoQA: A Benchmark for Unified Spatio-Temporal Video Understanding in Traffic Scenes | Feb 4, 2025 | Autonomous DrivingMultiple-choice | CodeCode Available | 1 |