| Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models | Jan 25, 2025 | AttributeContrastive Learning | CodeCode Available | 2 |
| One-shot 3D Object Canonicalization based on Geometric and Semantic Consistency | Jan 1, 2025 | Object | CodeCode Available | 2 |
| RORem: Training a Robust Object Remover with Human-in-the-Loop | Jan 1, 2025 | Object | CodeCode Available | 2 |
| YOLO-UniOW: Efficient Universal Open-World Object Detection | Dec 30, 2024 | Incremental LearningObject | CodeCode Available | 2 |
| CGCOD: Class-Guided Camouflaged Object Detection | Dec 25, 2024 | Objectobject-detection | CodeCode Available | 2 |
| Cross-View Referring Multi-Object Tracking | Dec 23, 2024 | Cross-view Referring Multi-Object TrackingMulti-Object Tracking | CodeCode Available | 2 |
| LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis | Dec 19, 2024 | Object | CodeCode Available | 2 |
| Multi-Sensor Object Anomaly Detection: Unifying Appearance, Geometry, and Internal Properties | Dec 19, 2024 | Anomaly DetectionObject | CodeCode Available | 2 |
| RelationField: Relate Anything in Radiance Fields | Dec 18, 2024 | 3d scene graph generationGraph Generation | CodeCode Available | 2 |
| Exploring Enhanced Contextual Information for Video-Level Object Tracking | Dec 15, 2024 | ObjectObject Tracking | CodeCode Available | 2 |
| MambaPro: Multi-Modal Object Re-Identification with Mamba Aggregation and Synergistic Prompt | Dec 14, 2024 | MambaObject | CodeCode Available | 2 |
| DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification | Dec 14, 2024 | Mixture-of-ExpertsObject | CodeCode Available | 2 |
| RemDet: Rethinking Efficient Model Design for UAV Object Detection | Dec 13, 2024 | Objectobject-detection | CodeCode Available | 2 |
| Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset | Dec 9, 2024 | Computational EfficiencyMixture-of-Experts | CodeCode Available | 2 |
| DEYOLO: Dual-Feature-Enhancement YOLO for Cross-Modality Object Detection | Dec 6, 2024 | Objectobject-detection | CodeCode Available | 2 |
| SADG: Segment Any Dynamic Gaussian Without Object Trackers | Nov 28, 2024 | 3D ReconstructionAutonomous Driving | CodeCode Available | 2 |
| Lost & Found: Tracking Changes from Egocentric Observations in 3D Dynamic Scene Graphs | Nov 28, 2024 | Object | CodeCode Available | 2 |
| DreamMix: Decoupling Object Attributes for Enhanced Editability in Customized Image Inpainting | Nov 26, 2024 | AttributeDiversity | CodeCode Available | 2 |
| Interpreting Object-level Foundation Models via Visual Precision Search | Nov 25, 2024 | Explainable Artificial Intelligence (XAI)Object | CodeCode Available | 2 |
| Open Vocabulary Monocular 3D Object Detection | Nov 25, 2024 | 3D Object DetectionMonocular 3D Object Detection | CodeCode Available | 2 |
| EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the Wild | Nov 21, 2024 | 3D ReconstructionObject | CodeCode Available | 2 |
| Find Any Part in 3D | Nov 20, 2024 | 3D Part SegmentationDiversity | CodeCode Available | 2 |
| Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis | Nov 11, 2024 | AttributeImage Generation | CodeCode Available | 2 |
| 3DGS-CD: 3D Gaussian Splatting-based Change Detection for Physical Object Rearrangement | Nov 6, 2024 | 3DGSChange Detection | CodeCode Available | 2 |
| Exploiting Unlabeled Data with Multiple Expert Teachers for Open Vocabulary Aerial Object Detection and Its Orientation Adaptation | Nov 4, 2024 | Earth ObservationObject | CodeCode Available | 2 |
| MonoDGP: Monocular 3D Object Detection with Decoupled-Query and Geometry-Error Priors | Oct 25, 2024 | 3D Object DetectionDepth Estimation | CodeCode Available | 2 |
| DI-MaskDINO: A Joint Object Detection and Instance Segmentation Model | Oct 22, 2024 | DecoderInstance Segmentation | CodeCode Available | 2 |
| Mitigating Object Hallucination via Concentric Causal Attention | Oct 21, 2024 | HallucinationObject | CodeCode Available | 2 |
| VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding | Oct 17, 2024 | 3D geometry3D visual grounding | CodeCode Available | 2 |
| Open World Object Detection: A Survey | Oct 15, 2024 | Incremental LearningObject | CodeCode Available | 2 |
| Multiview Scene Graph | Oct 15, 2024 | DecoderObject | CodeCode Available | 2 |
| High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity | Oct 14, 2024 | DenoisingDichotomous Image Segmentation | CodeCode Available | 2 |
| Towards Interpreting Visual Information Processing in Vision-Language Models | Oct 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| HazyDet: Open-source Benchmark for Drone-view Object Detection with Depth-cues in Hazy Scenes | Sep 30, 2024 | Objectobject-detection | CodeCode Available | 2 |
| One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos | Sep 29, 2024 | AllImage Segmentation | CodeCode Available | 2 |
| A Novel Unified Architecture for Low-Shot Counting by Detection and Segmentation | Sep 27, 2024 | Exemplar-Free CountingFew-shot Object Counting and Detection | CodeCode Available | 2 |
| Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation | Sep 26, 2024 | Image GenerationObject | CodeCode Available | 2 |
| Robot See Robot Do: Imitating Articulated Object Manipulation with Monocular 4D Reconstruction | Sep 26, 2024 | 4D reconstructionObject | CodeCode Available | 2 |
| Source-Free Domain Adaptation for YOLO Object Detection | Sep 25, 2024 | Domain AdaptationModel Selection | CodeCode Available | 2 |
| Progressive Representation Learning for Real-Time UAV Tracking | Sep 25, 2024 | ObjectObject Tracking | CodeCode Available | 2 |
| RockTrack: A 3D Robust Multi-Camera-Ken Multi-Object Tracking Framework | Sep 18, 2024 | 3D Multi-Object Tracking3D Object Detection | CodeCode Available | 2 |
| Improving Text-guided Object Inpainting with Semantic Pre-inpainting | Sep 12, 2024 | DenoisingObject | CodeCode Available | 2 |
| UniDet3D: Multi-dataset Indoor 3D Object Detection | Sep 6, 2024 | 3D Object DetectionObject | CodeCode Available | 2 |
| UTrack: Multi-Object Tracking with Uncertain Detections | Aug 30, 2024 | Autonomous DrivingMulti-Object Tracking | CodeCode Available | 2 |
| Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation | Aug 28, 2024 | ObjectSemantic Segmentation | CodeCode Available | 2 |
| GOReloc: Graph-based Object-Level Relocalization for Visual SLAM | Aug 15, 2024 | Objectobject-detection | CodeCode Available | 2 |
| In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation | Aug 9, 2024 | Image to textObject | CodeCode Available | 2 |
| Visible-Thermal Multiple Object Tracking: Large-scale Video Dataset and Progressive Fusion Approach | Aug 2, 2024 | cross-modal alignmentMultiple Object Tracking | CodeCode Available | 2 |
| ESOD: Efficient Small Object Detection on High-Resolution Images | Jul 23, 2024 | GPUObject | CodeCode Available | 2 |
| MonoWAD: Weather-Adaptive Diffusion Model for Robust Monocular 3D Object Detection | Jul 23, 2024 | 3D Object DetectionAutonomous Driving | CodeCode Available | 2 |