| OLViT: Multi-Modal State Tracking via Attention-Based Embeddings for Video-Grounded Dialog | Feb 20, 2024 | ObjectObject Tracking | —Unverified | 0 |
| GOOD: Towards Domain Generalized Orientated Object Detection | Feb 20, 2024 | HallucinationObject | —Unverified | 0 |
| MuLan: Multimodal-LLM Agent for Progressive and Interactive Multi-Object Diffusion | Feb 20, 2024 | AttributeLanguage Modeling | CodeCode Available | 1 |
| CST: Calibration Side-Tuning for Parameter and Memory Efficient Transfer Learning | Feb 20, 2024 | GPUObject | —Unverified | 0 |
| Efficient Parameter Mining and Freezing for Continual Object Detection | Feb 20, 2024 | Continual LearningIncremental Learning | —Unverified | 0 |
| Slot-VLM: SlowFast Slots for Video-Language Modeling | Feb 20, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Object-level Geometric Structure Preserving for Natural Image Stitching | Feb 20, 2024 | Image StitchingObject | CodeCode Available | 1 |
| DINOBot: Robot Manipulation via Retrieval and Alignment with Vision Foundation Models | Feb 20, 2024 | Imitation LearningObject | —Unverified | 0 |
| Visual Reasoning in Object-Centric Deep Neural Networks: A Comparative Cognition Approach | Feb 20, 2024 | ObjectRelational Reasoning | CodeCode Available | 0 |
| Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationships | Feb 19, 2024 | 3d scene graph generationObject | CodeCode Available | 2 |
| UncertaintyTrack: Exploiting Detection and Localization Uncertainty in Multi-Object Tracking | Feb 19, 2024 | Autonomous DrivingMulti-Object Tracking | CodeCode Available | 1 |
| Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models | Feb 18, 2024 | HallucinationObject | CodeCode Available | 1 |
| CoLLaVO: Crayon Large Language and Vision mOdel | Feb 17, 2024 | Large Language Modelmodel | CodeCode Available | 2 |
| GaussianObject: High-Quality 3D Object Reconstruction from Four Views with Gaussian Splatting | Feb 15, 2024 | 3D Object ReconstructionNeural Rendering | CodeCode Available | 5 |
| Lester: rotoscope animation through video object segmentation and tracking | Feb 15, 2024 | 3D Human Pose EstimationObject | CodeCode Available | 1 |
| Detecting Anomalous Events in Object-centric Business Processes via Graph Neural Networks | Feb 14, 2024 | Anomaly DetectionObject | CodeCode Available | 0 |
| Moving Object Proposals with Deep Learned Optical Flow for Video Object Segmentation | Feb 14, 2024 | DecoderObject | —Unverified | 0 |
| Few-Shot Object Detection with Sparse Context Transformers | Feb 14, 2024 | Few-Shot Object DetectionObject | —Unverified | 0 |
| H2O-SDF: Two-phase Learning for 3D Indoor Reconstruction using Object Surface Fields | Feb 13, 2024 | Indoor Scene ReconstructionNeRF | CodeCode Available | 0 |
| Leveraging Self-Supervised Instance Contrastive Learning for Radar Object Detection | Feb 13, 2024 | Contrastive LearningObject | —Unverified | 0 |
| Unsupervised Discovery of Object-Centric Neural Fields | Feb 12, 2024 | ObjectObject Discovery | —Unverified | 0 |
| Exploring Perceptual Limitation of Multimodal Large Language Models | Feb 12, 2024 | ObjectQuestion Answering | CodeCode Available | 1 |
| GBOT: Graph-Based 3D Object Tracking for Augmented Reality-Assisted Assembly Guidance | Feb 12, 2024 | 3D Object Tracking6D Pose Estimation | CodeCode Available | 1 |
| Semantic Object-level Modeling for Robust Visual Camera Relocalization | Feb 10, 2024 | Camera RelocalizationObject | —Unverified | 0 |
| Transfer learning with generative models for object detection on limited datasets | Feb 9, 2024 | GeophysicsObject | —Unverified | 0 |
| Event-to-Video Conversion for Overhead Object Detection | Feb 9, 2024 | Objectobject-detection | —Unverified | 0 |
| Improving 2D-3D Dense Correspondences with Diffusion Models for 6D Object Pose Estimation | Feb 9, 2024 | 6D Pose Estimation using RGBBenchmarking | —Unverified | 0 |
| Point-VOS: Pointing Up Video Object Segmentation | Feb 8, 2024 | ObjectSemantic Segmentation | —Unverified | 0 |
| CLIP-Loc: Multi-modal Landmark Association for Global Localization in Object-based Maps | Feb 8, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| InstaGen: Enhancing Object Detection by Training on Synthetic Dataset | Feb 8, 2024 | Objectobject-detection | —Unverified | 0 |
| Extending 6D Object Pose Estimators for Stereo Vision | Feb 8, 2024 | 6D Pose Estimation6D Pose Estimation using RGB | —Unverified | 0 |
| FuncGrasp: Learning Object-Centric Neural Grasp Functions from Single Annotated Example Object | Feb 8, 2024 | Object | —Unverified | 0 |
| Binding Dynamics in Rotating Features | Feb 8, 2024 | Object | —Unverified | 0 |
| NCRF: Neural Contact Radiance Fields for Free-Viewpoint Rendering of Hand-Object Interaction | Feb 8, 2024 | hand-object poseNovel View Synthesis | —Unverified | 0 |
| Text2Street: Controllable Text-to-image Generation for Street Views | Feb 7, 2024 | Image GenerationLayout Generation | —Unverified | 0 |
| FM-Fusion: Instance-aware Semantic Mapping Boosted by Vision-Language Foundation Models | Feb 7, 2024 | Instance SegmentationObject | CodeCode Available | 2 |
| Color Recognition in Challenging Lighting Environments: CNN Approach | Feb 7, 2024 | Edge DetectionImage Segmentation | —Unverified | 0 |
| Shape-biased Texture Agnostic Representations for Improved Textureless and Metallic Object Detection and 6D Pose Estimation | Feb 7, 2024 | 6D Pose EstimationObject | CodeCode Available | 0 |
| Tactile-based Object Retrieval From Granular Media | Feb 7, 2024 | ObjectRetrieval | —Unverified | 0 |
| Toward Accurate Camera-based 3D Object Detection via Cascade Depth Estimation and Calibration | Feb 7, 2024 | 3D Object DetectionDenoising | —Unverified | 0 |
| YOLOPoint Joint Keypoint and Object Detection | Feb 6, 2024 | Objectobject-detection | CodeCode Available | 2 |
| Open-Universe Indoor Scene Generation using LLM Program Synthesis and Uncurated Object Databases | Feb 5, 2024 | Layout GenerationObject | —Unverified | 0 |
| Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector | Feb 5, 2024 | Cross-Domain Few-ShotCross-Domain Few-Shot Object Detection | CodeCode Available | 2 |
| Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion | Feb 5, 2024 | ObjectVideo Generation | —Unverified | 0 |
| DexDiffuser: Generating Dexterous Grasps with Diffusion Models | Feb 5, 2024 | DenoisingGrasp Generation | —Unverified | 0 |
| Extreme Two-View Geometry From Object Poses with Diffusion Models | Feb 5, 2024 | Camera Pose EstimationObject | CodeCode Available | 1 |
| HASSOD: Hierarchical Adaptive Self-Supervised Object Detection | Feb 5, 2024 | Objectobject-detection | CodeCode Available | 2 |
| SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM | Feb 5, 2024 | 3D Semantic SegmentationCamera Pose Estimation | CodeCode Available | 3 |
| NOAH: Learning Pairwise Object Category Attentions for Image Classification | Feb 4, 2024 | Classificationimage-classification | CodeCode Available | 1 |
| CoFiNet: Unveiling Camouflaged Objects with Multi-Scale Finesse | Feb 3, 2024 | Image SegmentationObject | —Unverified | 0 |