| NEXT: Multi-Grained Mixture of Experts via Text-Modulation for Multi-Modal Object Re-ID | May 26, 2025 | AttributeCaption Generation | —Unverified | 0 |
| Progressive Scaling Visual Object Tracking | May 26, 2025 | ObjectObject Tracking | —Unverified | 0 |
| MaskedManipulator: Versatile Whole-Body Control for Loco-Manipulation | May 25, 2025 | Human-Object Interaction DetectionObject | —Unverified | 0 |
| EOTNet: Deep Memory Aided Bayesian Filter for Extended Object Tracking | May 24, 2025 | ObjectObject Tracking | CodeCode Available | 0 |
| FusionTrack: End-to-End Multi-Object Tracking in Arbitrary Multi-View Environment | May 24, 2025 | ManagementMulti-Object Tracking | —Unverified | 0 |
| ThinkVideo: High-Quality Reasoning Video Segmentation with Chain of Thoughts | May 24, 2025 | Image SegmentationInstance Segmentation | CodeCode Available | 0 |
| SD-OVON: A Semantics-aware Dataset and Benchmark Generation Pipeline for Open-Vocabulary Object Navigation in Dynamic Scenes | May 24, 2025 | Object | —Unverified | 0 |
| RQR3D: Reparametrizing the regression targets for BEV-based 3D object detection | May 23, 2025 | 3D Object DetectionAutonomous Driving | —Unverified | 0 |
| Object-level Cross-view Geo-localization with Location Enhancement and Multi-Head Cross Attention | May 23, 2025 | Few-Shot Learninggeo-localization | CodeCode Available | 1 |
| Adapting SAM 2 for Visual Object Tracking: 1st Place Solution for MMVPR Challenge Multi-Modal Tracking | May 23, 2025 | ObjectObject Tracking | —Unverified | 0 |
| Sampling Strategies for Efficient Training of Deep Learning Object Detection Algorithms | May 23, 2025 | Deep LearningObject | —Unverified | 0 |
| MAFE R-CNN: Selecting More Samples to Learn Category-aware Features for Small Object Detection | May 22, 2025 | Objectobject-detection | —Unverified | 0 |
| Semantic Compression of 3D Objects for Open and Collaborative Virtual Worlds | May 22, 2025 | ObjectSemantic Compression | —Unverified | 0 |
| Embodied Agents Meet Personalization: Exploring Memory Utilization for Personalized Assistance | May 22, 2025 | ObjectObject Rearrangement | —Unverified | 0 |
| TextureSAM: Towards a Texture Aware Foundation Model for Segmentation | May 22, 2025 | Material ClassificationObject | —Unverified | 0 |
| MEgoHand: Multimodal Egocentric Hand-Object Interaction Motion Generation | May 22, 2025 | Motion GenerationObject | —Unverified | 0 |
| Investigating Fine- and Coarse-grained Structural Correspondences Between Deep Neural Networks and Human Object Image Similarity Judgments Using Unsupervised Alignment | May 22, 2025 | ObjectSelf-Supervised Learning | —Unverified | 0 |
| PromptTAD: Object-Prompt Enhanced Traffic Anomaly Detection | May 22, 2025 | Anomaly DetectionObject | CodeCode Available | 0 |
| gen2seg: Generative Models Enable Generalizable Instance Segmentation | May 21, 2025 | DecoderInstance Segmentation | —Unverified | 0 |
| Expanding Zero-Shot Object Counting with Rich Prompts | May 21, 2025 | ObjectObject Counting | —Unverified | 0 |
| InstructSAM: A Training-Free Framework for Instruction-Oriented Remote Sensing Object Recognition | May 21, 2025 | Earth ObservationObject | CodeCode Available | 2 |
| Multispectral Detection Transformer with Infrared-Centric Sensor Fusion | May 21, 2025 | Multispectral Object DetectionObject | CodeCode Available | 0 |
| RAZER: Robust Accelerated Zero-Shot 3D Open-Vocabulary Panoptic Reconstruction with Spatio-Temporal Aggregation | May 21, 2025 | GPUNatural Language Queries | —Unverified | 0 |
| Object-Focus Actor for Data-efficient Robot Generalization Dexterous Manipulation | May 21, 2025 | ObjectPose Estimation | —Unverified | 0 |
| Optimizing Retrieval Augmented Generation for Object Constraint Language | May 19, 2025 | Large Language ModelObject | —Unverified | 0 |
| LiDAR MOT-DETR: A LiDAR-based Two-Stage Transformer for 3D Multiple Object Tracking | May 19, 2025 | Multi-Object TrackingMultiple Object Tracking | —Unverified | 0 |
| Dynamic Graph Induced Contour-aware Heat Conduction Network for Event-based Object Detection | May 19, 2025 | Event-based visionObject | CodeCode Available | 2 |
| OPA-Pack: Object-Property-Aware Robotic Bin Packing | May 19, 2025 | ObjectQ-Learning | —Unverified | 0 |
| Emergent Active Perception and Dexterity of Simulated Humanoids from Visual Reinforcement Learning | May 18, 2025 | Object | —Unverified | 0 |
| GTR: Gaussian Splatting Tracking and Reconstruction of Unknown Objects Based on Appearance and Geometric Complexity | May 17, 2025 | 3D ReconstructionObject | —Unverified | 0 |
| Feasibility with Language Models for Open-World Compositional Zero-Shot Learning | May 16, 2025 | AttributeCompositional Zero-Shot Learning | —Unverified | 0 |
| PARSEC: Preference Adaptation for Robotic Object Rearrangement from Scene Context | May 16, 2025 | ObjectObject Rearrangement | CodeCode Available | 0 |
| AW-GATCN: Adaptive Weighted Graph Attention Convolutional Network for Event Camera Data Joint Denoising and Object Recognition | May 16, 2025 | DenoisingEvent Segmentation | —Unverified | 0 |
| RefPose: Leveraging Reference Geometric Correspondences for Accurate 6D Pose Estimation of Unseen Objects | May 16, 2025 | 6D Pose EstimationObject | —Unverified | 0 |
| A High-Performance Thermal Infrared Object Detection Framework with Centralized Regulation | May 16, 2025 | Objectobject-detection | —Unverified | 0 |
| StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation | May 15, 2025 | Face RecognitionObject | CodeCode Available | 1 |
| MIRAGE: A Multi-modal Benchmark for Spatial Perception, Reasoning, and Intelligence | May 15, 2025 | AttributeObject | —Unverified | 0 |
| ManipBench: Benchmarking Vision-Language Models for Low-Level Robot Manipulation | May 14, 2025 | BenchmarkingDeformable Object Manipulation | —Unverified | 0 |
| MoRAL: Motion-aware Multi-Frame 4D Radar and LiDAR Fusion for Robust 3D Object Detection | May 14, 2025 | 3D Object DetectionAutonomous Driving | —Unverified | 0 |
| Camera-Only 3D Panoptic Scene Completion for Autonomous Driving through Differentiable Object Shapes | May 14, 2025 | 3D Semantic Scene CompletionAutonomous Driving | CodeCode Available | 0 |
| Beyond General Prompts: Automated Prompt Refinement using Contrastive Class Alignment Scores for Disambiguating Objects in Vision-Language Models | May 14, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Robustness Analysis against Adversarial Patch Attacks in Fully Unmanned Stores | May 13, 2025 | Objectobject-detection | —Unverified | 0 |
| Leveraging Multi-Modal Information to Enhance Dataset Distillation | May 13, 2025 | Dataset DistillationObject | —Unverified | 0 |
| Towards Autonomous UAV Visual Object Search in City Space: Benchmark and Agentic Methodology | May 13, 2025 | DenoisingObject | —Unverified | 0 |
| HMPNet: A Feature Aggregation Architecture for Maritime Object Detection from a Shipborne Perspective | May 13, 2025 | Computational EfficiencyObject | CodeCode Available | 0 |
| Object detection in adverse weather conditions for autonomous vehicles using Instruct Pix2Pix | May 13, 2025 | Autonomous DrivingAutonomous Vehicles | —Unverified | 0 |
| Improving Unsupervised Task-driven Models of Ventral Visual Stream via Relative Position Predictivity | May 13, 2025 | Contrastive LearningObject | CodeCode Available | 0 |
| Asynchronous Multi-Object Tracking with an Event Camera | May 12, 2025 | Multi-Object TrackingObject | CodeCode Available | 1 |
| Towards Accurate State Estimation: Kalman Filter Incorporating Motion Dynamics for 3D Multi-Object Tracking | May 12, 2025 | 3D Multi-Object TrackingMulti-Object Tracking | —Unverified | 0 |
| Hybrid Spiking Vision Transformer for Object Detection with Event Cameras | May 12, 2025 | Event DetectionObject | —Unverified | 0 |