| QORT-Former: Query-optimized Real-time Transformer for Understanding Two Hands Manipulating Objects | Feb 27, 2025 | 3D Pose EstimationAction Recognition | —Unverified | 0 |
| MITracker: Multi-View Integration for Visual Object Tracking | Feb 27, 2025 | ObjectObject Tracking | —Unverified | 0 |
| Analyzing CLIP's Performance Limitations in Multi-Object Scenarios: A Controlled High-Resolution Study | Feb 27, 2025 | Image GenerationObject | —Unverified | 0 |
| BEVDiffuser: Plug-and-Play Diffusion Model for BEV Denoising with Ground-Truth Guidance | Feb 27, 2025 | 3D Object DetectionAutonomous Driving | —Unverified | 0 |
| Spectral-Enhanced Transformers: Leveraging Large-Scale Pretrained Models for Hyperspectral Object Tracking | Feb 26, 2025 | ObjectObject Tracking | —Unverified | 0 |
| CoopDETR: A Unified Cooperative Perception Framework for 3D Detection via Object Query | Feb 26, 2025 | Autonomous VehiclesObject | —Unverified | 0 |
| ObjectVLA: End-to-End Open-World Object Manipulation Without Demonstration | Feb 26, 2025 | Imitation LearningObject | —Unverified | 0 |
| Dictionary-based Framework for Interpretable and Consistent Object Parsing | Feb 26, 2025 | Contrastive LearningObject | —Unverified | 0 |
| Joint Reconstruction of Spatially-Coherent and Realistic Clothed Humans and Objects from a Single Image | Feb 25, 2025 | ObjectObject Reconstruction | —Unverified | 0 |
| FetchBot: Object Fetching in Cluttered Shelves via Zero-Shot Sim2Real | Feb 25, 2025 | ObjectReinforcement Learning (RL) | —Unverified | 0 |
| A Distributional Treatment of Real2Sim2Real for Object-Centric Agent Adaptation in Vision-Driven Deformable Linear Object Manipulation | Feb 25, 2025 | Object | —Unverified | 0 |
| Enhancing Reusability of Learned Skills for Robot Manipulation via Gaze and Bottleneck | Feb 25, 2025 | Imitation LearningObject | —Unverified | 0 |
| SPARC: Score Prompting and Adaptive Fusion for Zero-Shot Multi-Label Recognition in Vision-Language Models | Feb 24, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| CRTrack: Low-Light Semi-Supervised Multi-object Tracking Based on Consistency Regularization | Feb 24, 2025 | Multi-Object TrackingObject | CodeCode Available | 0 |
| V-HOP: Visuo-Haptic 6D Object Pose Tracking | Feb 24, 2025 | ObjectObject Tracking | —Unverified | 0 |
| MQADet: A Plug-and-Play Paradigm for Enhancing Open-Vocabulary Object Detection via Multimodal Question Answering | Feb 23, 2025 | Objectobject-detection | —Unverified | 0 |
| Geometry-Aware 3D Salient Object Detection Network | Feb 23, 2025 | Objectobject-detection | —Unverified | 0 |
| Reasoning about Affordances: Causal and Compositional Reasoning in LLMs | Feb 23, 2025 | Object | —Unverified | 0 |
| The Role of Background Information in Reducing Object Hallucination in Vision-Language Models: Insights from Cutoff API Prompting | Feb 21, 2025 | HallucinationObject | —Unverified | 0 |
| ODVerse33: Is the New YOLO Version Always Better? A Multi Domain benchmark from YOLO v5 to v11 | Feb 20, 2025 | Autonomous DrivingObject | —Unverified | 0 |
| Watch Less, Feel More: Sim-to-Real RL for Generalizable Articulated Object Manipulation via Motion Adaptation and Impedance Control | Feb 20, 2025 | Motion PlanningObject | —Unverified | 0 |
| MSVCOD:A Large-Scale Multi-Scene Dataset for Video Camouflage Object Detection | Feb 19, 2025 | Objectobject-detection | —Unverified | 0 |
| Object-centric Binding in Contrastive Language-Image Pretraining | Feb 19, 2025 | Image-text matchingObject | —Unverified | 0 |
| Capturing Rich Behavior Representations: A Dynamic Action Semantic-Aware Graph Transformer for Video Captioning | Feb 19, 2025 | Knowledge DistillationObject | —Unverified | 0 |
| Object-Pose Estimation With Neural Population Codes | Feb 19, 2025 | CPUObject | —Unverified | 0 |
| RAPTOR: Refined Approach for Product Table Object Recognition | Feb 19, 2025 | ObjectObject Recognition | —Unverified | 0 |
| MEX: Memory-efficient Approach to Referring Multi-Object Tracking | Feb 19, 2025 | Autonomous DrivingGPU | —Unverified | 0 |
| RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection | Feb 18, 2025 | 3D Object DetectionObject | —Unverified | 0 |
| CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image | Feb 18, 2025 | 3D Generation3D Scene Reconstruction | —Unverified | 0 |
| ROI-NeRFs: Hi-Fi Visualization of Objects of Interest within a Scene by NeRFs Composition | Feb 18, 2025 | 3D ReconstructionNeRF | —Unverified | 0 |
| Instance-Level Moving Object Segmentation from a Single Image with Events | Feb 18, 2025 | ObjectSemantic Segmentation | —Unverified | 0 |
| RHINO: Learning Real-Time Humanoid-Human-Object Interaction from Human Demonstrations | Feb 18, 2025 | Human-Object Interaction DetectionObject | —Unverified | 0 |
| A Monocular Event-Camera Motion Capture System | Feb 17, 2025 | Object | —Unverified | 0 |
| Enhancing Transparent Object Pose Estimation: A Fusion of GDR-Net and Edge Detection | Feb 17, 2025 | 6D Pose Estimation using RGBEdge Detection | —Unverified | 0 |
| Revealing Bias Formation in Deep Neural Networks Through the Geometric Mechanisms of Human Visual Decoupling | Feb 17, 2025 | ObjectObject Recognition | —Unverified | 0 |
| FocalCount: Towards Class-Count Imbalance in Class-Agnostic Counting | Feb 15, 2025 | ObjectObject Counting | —Unverified | 0 |
| HIPPo: Harnessing Image-to-3D Priors for Model-free Zero-shot 6D Pose Estimation | Feb 14, 2025 | 3D Reconstruction6D Pose Estimation | —Unverified | 0 |
| Evaluating the Meta- and Object-Level Reasoning of Large Language Models for Question Answering | Feb 14, 2025 | Mathematical ReasoningObject | —Unverified | 0 |
| Object Detection and Tracking | Feb 14, 2025 | Deep LearningObject | CodeCode Available | 0 |
| Object-Centric Latent Action Learning | Feb 13, 2025 | Imitation LearningObject | —Unverified | 0 |
| Safe Multi-agent Satellite Servicing with Control Barrier Functions | Feb 13, 2025 | ObjectPosition | —Unverified | 0 |
| CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation | Feb 12, 2025 | ObjectText-to-Video Generation | —Unverified | 0 |
| Articulate That Object Part (ATOP): 3D Part Articulation from Text and Motion Personalization | Feb 11, 2025 | Image GenerationMotion Generation | —Unverified | 0 |
| VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation | Feb 11, 2025 | Image to Video GenerationObject | —Unverified | 0 |
| Dense Object Detection Based on De-homogenized Queries | Feb 11, 2025 | Dense Object DetectionObject | —Unverified | 0 |
| Secure Visual Data Processing via Federated Learning | Feb 9, 2025 | Federated LearningManagement | —Unverified | 0 |
| Neural Clustering for Prefractured Mesh Generation in Real-time Object Destruction | Feb 7, 2025 | ClusteringObject | —Unverified | 0 |
| LP-DETR: Layer-wise Progressive Relations for Object Detection | Feb 7, 2025 | DecoderObject | —Unverified | 0 |
| HD-EPIC: A Highly-Detailed Egocentric Video Dataset | Feb 6, 2025 | Action RecognitionNutrition | —Unverified | 0 |
| Advanced Object Detection and Pose Estimation with Hybrid Task Cascade and High-Resolution Networks | Feb 6, 2025 | Autonomous DrivingObject | —Unverified | 0 |