| SST-EM: Advanced Metrics for Evaluating Semantic, Spatial and Temporal Aspects in Video Editing | Jan 13, 2025 | Objectobject-detection | CodeCode Available | 0 |
| BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations | Jan 13, 2025 | ObjectText-to-Video Generation | —Unverified | 0 |
| Mamba-MOC: A Multicategory Remote Object Counting via State Space Model | Jan 12, 2025 | MambaObject | —Unverified | 0 |
| UniQ: Unified Decoder with Task-specific Queries for Efficient Scene Graph Generation | Jan 10, 2025 | DecoderGraph Generation | —Unverified | 0 |
| Improving Skeleton-based Action Recognition with Interactive Object Information | Jan 9, 2025 | Action RecognitionData Augmentation | CodeCode Available | 0 |
| From Simple to Complex Skills: The Case of In-Hand Object Reorientation | Jan 9, 2025 | Object | —Unverified | 0 |
| Perception-as-Control: Fine-grained Controllable Image Animation with 3D-aware Motion Representation | Jan 9, 2025 | Image AnimationObject | —Unverified | 0 |
| UPAQ: A Framework for Real-Time and Energy-Efficient 3D Object Detection in Autonomous Vehicles | Jan 8, 2025 | 3D Object DetectionAutonomous Vehicles | —Unverified | 0 |
| TexHOI: Reconstructing Textures of 3D Unknown Objects in Monocular Hand-Object Interaction Scenes | Jan 7, 2025 | Object | CodeCode Available | 0 |
| Learning to Transfer Human Hand Skills for Robot Manipulations | Jan 7, 2025 | ObjectRobot Manipulation | —Unverified | 0 |
| AuxDepthNet: Real-Time Monocular 3D Object Detection with Depth-Sensitive Features | Jan 7, 2025 | 3D Object DetectionComputational Efficiency | —Unverified | 0 |
| Human Gaze Boosts Object-Centered Representation Learning | Jan 6, 2025 | Gaze PredictionObject | —Unverified | 0 |
| HOGSA: Bimanual Hand-Object Interaction Understanding with 3D Gaussian Splatting Based Data Augmentation | Jan 6, 2025 | 3DGSData Augmentation | —Unverified | 0 |
| Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation | Jan 6, 2025 | Image to Video GenerationObject | —Unverified | 0 |
| MObI: Multimodal Object Inpainting Using Diffusion Models | Jan 6, 2025 | Autonomous DrivingObject | —Unverified | 0 |
| Universal Fine-grained Visual Categorization by Concept Guided Learning | Jan 6, 2025 | Fine-Grained Image ClassificationFine-Grained Visual Categorization | CodeCode Available | 0 |
| Hierarchical Compact Clustering Attention (COCA) for Unsupervised Object-Centric Learning | Jan 1, 2025 | ClusteringDecoder | —Unverified | 0 |
| UniHOPE: A Unified Approach for Hand-Only and Hand-Object Pose Estimation | Jan 1, 2025 | hand-object poseHand Pose Estimation | CodeCode Available | 0 |
| Cross-Modal Distillation for 2D/3D Multi-Object Discovery from 2D Motion | Jan 1, 2025 | Multi-object discoveryObject | —Unverified | 0 |
| InterAct: Advancing Large-Scale Versatile 3D Human-Object Interaction Generation | Jan 1, 2025 | BenchmarkingHuman-Object Interaction Detection | —Unverified | 0 |
| HORP: Human-Object Relation Priors Guided HOI Detection | Jan 1, 2025 | Human-Object Interaction DetectionObject | —Unverified | 0 |
| CorrBEV: Multi-View 3D Object Detection by Correlation Learning with Multi-modal Prototypes | Jan 1, 2025 | 3D Object DetectionAutonomous Driving | —Unverified | 0 |
| Language-Guided Salient Object Ranking | Jan 1, 2025 | ObjectSaliency Ranking | —Unverified | 0 |
| Learning Endogenous Attention for Incremental Object Detection | Jan 1, 2025 | Objectobject-detection | —Unverified | 0 |
| GLASS: Guided Latent Slot Diffusion for Object-Centric Learning | Jan 1, 2025 | Conditional Image GenerationImage Generation | —Unverified | 0 |
| FusionSORT: Fusion Methods for Online Multi-object Visual Tracking | Jan 1, 2025 | ObjectVisual Tracking | CodeCode Available | 0 |
| Composing Parts for Expressive Object Generation | Jan 1, 2025 | AttributeDenoising | —Unverified | 0 |
| Common3D: Self-Supervised Learning of 3D Morphable Models for Common Objects in Neural Feature Space | Jan 1, 2025 | Instance SegmentationObject | CodeCode Available | 0 |
| Rethinking Correspondence-based Category-Level Object Pose Estimation | Jan 1, 2025 | ObjectPose Estimation | —Unverified | 0 |
| Learning Class Prototypes for Unified Sparse-Supervised 3D Object Detection | Jan 1, 2025 | 3D Object DetectionObject | —Unverified | 0 |
| Robust Multi-Object 4D Generation for In-the-wild Videos | Jan 1, 2025 | ObjectScene Generation | —Unverified | 0 |
| PICO: Reconstructing 3D People In Contact with Objects | Jan 1, 2025 | Human-Object Interaction DetectionObject | —Unverified | 0 |
| SET: Spectral Enhancement for Tiny Object Detection | Jan 1, 2025 | Objectobject-detection | —Unverified | 0 |
| DynaMoDe-NeRF: Motion-aware Deblurring Neural Radiance Field for Dynamic Scenes | Jan 1, 2025 | DeblurringNeRF | —Unverified | 0 |
| LatentHOI: On the Generalizable Hand Object Motion Generation with Latent Hand Diffusion. | Jan 1, 2025 | Motion GenerationObject | —Unverified | 0 |
| DreamRelation: Bridging Customization and Relation Generation | Jan 1, 2025 | Image GenerationObject | —Unverified | 0 |
| Dragin3D: Image Editing by Dragging in 3D Space | Jan 1, 2025 | 3D Object Reconstructioncontinuous-control | —Unverified | 0 |
| PIAD: Pose and Illumination agnostic Anomaly Detection | Jan 1, 2025 | Anomaly DetectionObject | —Unverified | 0 |
| MAD: Memory-Augmented Detection of 3D Objects | Jan 1, 2025 | Object | —Unverified | 0 |
| Learning Partonomic 3D Reconstruction from Image Collections | Jan 1, 2025 | 3D ReconstructionImage Generation | CodeCode Available | 0 |
| CaMuViD: Calibration-Free Multi-View Detection | Jan 1, 2025 | Camera CalibrationManagement | —Unverified | 0 |
| ONDA-Pose: Occlusion-Aware Neural Domain Adaptation for Self-Supervised 6D Object Pose Estimation | Jan 1, 2025 | 6D Pose Estimation using RGBDomain Adaptation | —Unverified | 0 |
| Camouflage Anything: Learning to Hide using Controlled Out-painting and Representation Engineering | Jan 1, 2025 | Camouflaged Object SegmentationObject | —Unverified | 0 |
| Radio Frequency Ray Tracing with Neural Object Representation for Enhanced RF Modeling | Jan 1, 2025 | Object | —Unverified | 0 |
| Generalizable Object Keypoint Localization from Generative Priors | Jan 1, 2025 | Cross-Domain Few-ShotImage Generation | —Unverified | 0 |
| Perceptual Inductive Bias Is What You Need Before Contrastive Learning | Jan 1, 2025 | Contrastive LearningDepth Estimation | —Unverified | 0 |
| ROD-MLLM: Towards More Reliable Object Detection in Multimodal Large Language Models | Jan 1, 2025 | Large Language ModelObject | —Unverified | 0 |
| EntitySAM: Segment Everything in Video | Jan 1, 2025 | DecoderObject | —Unverified | 0 |
| HOIGPT: Learning Long-Sequence Hand-Object Interaction with Language Models | Jan 1, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| BIGS: Bimanual Category-agnostic Interaction Reconstruction from Monocular Videos via 3D Gaussian Splatting | Jan 1, 2025 | 3D Hand Pose Estimation3D Object Reconstruction | —Unverified | 0 |