| E2E-MFD: Towards End-to-End Synchronous Multimodal Fusion Detection | Mar 14, 2024 | Autonomous DrivingObject | CodeCode Available | 2 |
| LISO: Lidar-only Self-Supervised 3D Object Detection | Mar 11, 2024 | 3D Object DetectionObject | CodeCode Available | 2 |
| Poly Kernel Inception Network for Remote Sensing Detection | Mar 10, 2024 | Objectobject-detection | CodeCode Available | 2 |
| SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection | Mar 9, 2024 | 3D Object DetectionAutonomous Driving | CodeCode Available | 2 |
| Beyond MOT: Semantic Multi-Object Tracking | Mar 8, 2024 | Multi-Object TrackingObject | CodeCode Available | 2 |
| VastTrack: Vast Category Visual Object Tracking | Mar 6, 2024 | ObjectObject Tracking | CodeCode Available | 2 |
| EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation | Mar 3, 2024 | ObjectRepresentation Learning | CodeCode Available | 2 |
| HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding | Mar 1, 2024 | HallucinationObject | CodeCode Available | 2 |
| DAMSDet: Dynamic Adaptive Multispectral Detection Transformer with Competitive Query Selection and Adaptive Feature Fusion | Mar 1, 2024 | Objectobject-detection | CodeCode Available | 2 |
| FusionVision: A comprehensive approach of 3D object reconstruction and segmentation from RGB-D cameras using YOLO and fast segment anything | Feb 29, 2024 | 3D Object ReconstructionInstance Segmentation | CodeCode Available | 2 |
| HOISDF: Constraining 3D Hand-Object Pose Estimation with Global Signed Distance Fields | Feb 26, 2024 | 3D Hand Pose Estimationhand-object pose | CodeCode Available | 2 |
| Grasp, See, and Place: Efficient Unknown Object Rearrangement with Policy Structure Prior | Feb 23, 2024 | ObjectObject Rearrangement | CodeCode Available | 2 |
| VOOM: Robust Visual Object Odometry and Mapping using Hierarchical Landmarks | Feb 21, 2024 | Computational EfficiencyObject | CodeCode Available | 2 |
| Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationships | Feb 19, 2024 | 3d scene graph generationObject | CodeCode Available | 2 |
| CoLLaVO: Crayon Large Language and Vision mOdel | Feb 17, 2024 | Large Language Modelmodel | CodeCode Available | 2 |
| FM-Fusion: Instance-aware Semantic Mapping Boosted by Vision-Language Foundation Models | Feb 7, 2024 | Instance SegmentationObject | CodeCode Available | 2 |
| YOLOPoint Joint Keypoint and Object Detection | Feb 6, 2024 | Objectobject-detection | CodeCode Available | 2 |
| Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector | Feb 5, 2024 | Cross-Domain Few-ShotCross-Domain Few-Shot Object Detection | CodeCode Available | 2 |
| HASSOD: Hierarchical Adaptive Self-Supervised Object Detection | Feb 5, 2024 | Objectobject-detection | CodeCode Available | 2 |
| MF-MOS: A Motion-Focused Model for Moving Object Segmentation | Jan 30, 2024 | Autonomous DrivingObject | CodeCode Available | 2 |
| Beyond the Contact: Discovering Comprehensive Affordance for 3D Objects from Pre-trained 2D Diffusion Models | Jan 23, 2024 | Human-Object Interaction DetectionObject | CodeCode Available | 2 |
| Removal then Selection: A Coarse-to-Fine Fusion Perspective for RGB-Infrared Object Detection | Jan 19, 2024 | Multispectral Object DetectionObject | CodeCode Available | 2 |
| Efficient4D: Fast Dynamic 3D Object Generation from a Single-view Video | Jan 16, 2024 | Image GenerationImage to 3D | CodeCode Available | 2 |
| OBSeg: Accurate and Fast Instance Segmentation Framework Using Segmentation Foundation Models with Oriented Bounding Box Prompts | Jan 16, 2024 | Amodal Instance SegmentationInstance Segmentation | CodeCode Available | 2 |
| Fine-Grained Prototypes Distillation for Few-Shot Object Detection | Jan 15, 2024 | Few-Shot Object DetectionMeta-Learning | CodeCode Available | 2 |
| RoboFusion: Towards Robust Multi-Modal 3D Object Detection via SAM | Jan 8, 2024 | 3D Object DetectionAutonomous Driving | CodeCode Available | 2 |
| MS-DETR: Efficient DETR Training with Mixed Supervision | Jan 8, 2024 | DecoderObject | CodeCode Available | 2 |
| Context-Guided Spatio-Temporal Video Grounding | Jan 3, 2024 | ObjectSpatio-Temporal Video Grounding | CodeCode Available | 2 |
| Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation | Jan 1, 2024 | DescriptiveObject | CodeCode Available | 2 |
| Exploring Orthogonality in Open World Object Detection | Jan 1, 2024 | Incremental LearningObject | CodeCode Available | 2 |
| Point Segment and Count: A Generalized Framework for Object Counting | Jan 1, 2024 | Few-shot Object Counting and DetectionKnowledge Distillation | CodeCode Available | 2 |
| ARTrackV2: Prompting Autoregressive Tracker Where to Look and How to Describe | Dec 28, 2023 | ObjectObject Tracking | CodeCode Available | 2 |
| UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces | Dec 25, 2023 | Image SegmentationObject | CodeCode Available | 2 |
| Prototype-based Cross-Modal Object Tracking | Dec 22, 2023 | ObjectObject Tracking | CodeCode Available | 2 |
| VCoder: Versatile Vision Encoders for Multimodal Large Language Models | Dec 21, 2023 | Image CaptioningImage Generation | CodeCode Available | 2 |
| UCMCTrack: Multi-Object Tracking with Uniform Camera Motion Compensation | Dec 14, 2023 | Motion CompensationMulti-Object Tracking | CodeCode Available | 2 |
| Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers | Dec 13, 2023 | 3D Question Answering (3D-QA)Attribute | CodeCode Available | 2 |
| SAM-Assisted Remote Sensing Imagery Semantic Segmentation with Object and Boundary Constraints | Dec 5, 2023 | Model OptimizationNovel Concepts | CodeCode Available | 2 |
| Aligning and Prompting Everything All at Once for Universal Visual Perception | Dec 4, 2023 | AllObject | CodeCode Available | 2 |
| ImageDream: Image-Prompt Multi-view Diffusion for 3D Generation | Dec 2, 2023 | 3D GenerationObject | CodeCode Available | 2 |
| Gaussian Grouping: Segment and Edit Anything in 3D Scenes | Dec 1, 2023 | ColorizationNeRF | CodeCode Available | 2 |
| TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models | Dec 1, 2023 | Image ClassificationMulti-Object Tracking | CodeCode Available | 2 |
| HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video | Nov 30, 2023 | 3D ReconstructionObject | CodeCode Available | 2 |
| A Graph-Based Approach for Category-Agnostic Pose Estimation | Nov 29, 2023 | 2D Pose EstimationAnimal Pose Estimation | CodeCode Available | 2 |
| Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding | Nov 28, 2023 | HallucinationObject | CodeCode Available | 2 |
| SAM-6D: Segment Anything Model Meets Zero-Shot 6D Object Pose Estimation | Nov 27, 2023 | 6D Pose Estimation using RGBInstance Segmentation | CodeCode Available | 2 |
| Open-Vocabulary Camouflaged Object Segmentation | Nov 19, 2023 | Camouflaged Object SegmentationImage Segmentation | CodeCode Available | 2 |
| Clearer Frames, Anytime: Resolving Velocity Ambiguity in Video Frame Interpolation | Nov 14, 2023 | ObjectVideo Editing | CodeCode Available | 2 |
| GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment | Oct 17, 2023 | AttributeObject | CodeCode Available | 2 |
| CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection | Oct 4, 2023 | 3D Object Detectioncross-modal alignment | CodeCode Available | 2 |