| 4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models | Mar 13, 2025 | Large Language ModelObject | CodeCode Available | 2 |
| HazyDet: Open-source Benchmark for Drone-view Object Detection with Depth-cues in Hazy Scenes | Sep 30, 2024 | Objectobject-detection | CodeCode Available | 2 |
| HOISDF: Constraining 3D Hand-Object Pose Estimation with Global Signed Distance Fields | Feb 26, 2024 | 3D Hand Pose Estimationhand-object pose | CodeCode Available | 2 |
| HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video | Nov 30, 2023 | 3D ReconstructionObject | CodeCode Available | 2 |
| Improving Text-guided Object Inpainting with Semantic Pre-inpainting | Sep 12, 2024 | DenoisingObject | CodeCode Available | 2 |
| In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation | Aug 9, 2024 | Image to textObject | CodeCode Available | 2 |
| Gaussian Grouping: Segment and Edit Anything in 3D Scenes | Dec 1, 2023 | ColorizationNeRF | CodeCode Available | 2 |
| ALBench: A Framework for Evaluating Active Learning in Object Detection | Jul 27, 2022 | Active Learningimage-classification | CodeCode Available | 2 |
| DetGPT: Detect What You Need via Reasoning | May 23, 2023 | Autonomous DrivingObject | CodeCode Available | 2 |
| InterDiff: Generating 3D Human-Object Interactions with Physics-Informed Diffusion | Aug 31, 2023 | 3D Human DynamicsHuman Dynamics | CodeCode Available | 2 |
| Interpreting Object-level Foundation Models via Visual Precision Search | Nov 25, 2024 | Explainable Artificial Intelligence (XAI)Object | CodeCode Available | 2 |
| Is CLIP the main roadblock for fine-grained open-world perception? | Apr 4, 2024 | Autonomous DrivingNovel Concepts | CodeCode Available | 2 |
| Knowledge Distillation in YOLOX-ViT for Side-Scan Sonar Object Detection | Mar 14, 2024 | Knowledge DistillationNovel Object Detection | CodeCode Available | 2 |
| Autoregressive Visual Tracking | Jan 1, 2023 | ObjectObject Tracking | CodeCode Available | 2 |
| LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation | Mar 30, 2023 | Image GenerationLayout-to-Image Generation | CodeCode Available | 2 |
| Learning Embeddings with Centroid Triplet Loss for Object Identification in Robotic Grasping | Apr 9, 2024 | Image RetrievalObject | CodeCode Available | 2 |
| LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis | Dec 19, 2024 | Object | CodeCode Available | 2 |
| LeYOLO, New Scalable and Efficient CNN Architecture for Object Detection | Jun 20, 2024 | Computational EfficiencyObject | CodeCode Available | 2 |
| Localization Distillation for Object Detection | Apr 12, 2022 | Knowledge DistillationObject | CodeCode Available | 2 |
| Detect Everything with Few Examples | Sep 22, 2023 | Binary ClassificationCross-Domain Few-Shot Object Detection | CodeCode Available | 2 |
| DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World | Jun 30, 2025 | Caption GenerationObject | CodeCode Available | 2 |
| Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification | Mar 15, 2024 | Object | CodeCode Available | 2 |
| MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare | Dec 13, 2022 | 3D Object Detection6D Pose Estimation | CodeCode Available | 2 |
| MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking | Jul 28, 2023 | Multi-Object TrackingMultiple Object Tracking | CodeCode Available | 2 |
| DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution | Jun 3, 2020 | Instance SegmentationObject | CodeCode Available | 2 |
| MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions | Aug 16, 2023 | Motion Expressions Guided Video SegmentationObject | CodeCode Available | 2 |
| 3DGS-CD: 3D Gaussian Splatting-based Change Detection for Physical Object Rearrangement | Nov 6, 2024 | 3DGSChange Detection | CodeCode Available | 2 |
| Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding | Nov 28, 2023 | HallucinationObject | CodeCode Available | 2 |
| MonoCD: Monocular 3D Object Detection with Complementary Depths | Apr 4, 2024 | 3D Object DetectionDepth Estimation | CodeCode Available | 2 |
| Monocular 3D Object Detection with Depth from Motion | Jul 26, 2022 | 3D Object DetectionDepth Estimation | CodeCode Available | 2 |
| Beyond MOT: Semantic Multi-Object Tracking | Mar 8, 2024 | Multi-Object TrackingObject | CodeCode Available | 2 |
| AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention | Jun 18, 2024 | ObjectResponse Generation | CodeCode Available | 2 |
| Deep Snake for Real-Time Instance Segmentation | Jan 6, 2020 | GPUInstance Segmentation | CodeCode Available | 2 |
| DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification | Dec 14, 2024 | Mixture-of-ExpertsObject | CodeCode Available | 2 |
| Aligning and Prompting Everything All at Once for Universal Visual Perception | Dec 4, 2023 | AllObject | CodeCode Available | 2 |
| Dense Distinct Query for End-to-End Object Detection | Mar 22, 2023 | Objectobject-detection | CodeCode Available | 2 |
| Multi-Class Road User Detection With 3+1D Radar in the View-of-Delft Dataset | Apr 1, 2022 | 3D Object DetectionBenchmarking | CodeCode Available | 2 |
| Multi-Grained Angle Representation for Remote Sensing Object Detection | Sep 7, 2022 | Objectobject-detection | CodeCode Available | 2 |
| DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds | Jun 9, 2023 | 3D Multi-Object Tracking3D Object Detection | CodeCode Available | 2 |
| DAVE -- A Detect-and-Verify Paradigm for Low-Shot Counting | Apr 25, 2024 | Exemplar-Free CountingFew-shot Object Counting and Detection | CodeCode Available | 2 |
| NetTrack: Tracking Highly Dynamic Objects with a Net | Mar 17, 2024 | Multi-Object TrackingObject | CodeCode Available | 2 |
| NOPE: Novel Object Pose Estimation from a Single Image | Mar 23, 2023 | ObjectPose Estimation | CodeCode Available | 2 |
| Objaverse++: Curated 3D Object Dataset with Quality Annotations | Apr 9, 2025 | 3D GenerationAttribute | CodeCode Available | 2 |
| Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation | Sep 24, 2019 | DecoderObject | CodeCode Available | 2 |
| BOP Challenge 2020 on 6D Object Localization | Sep 15, 2020 | 6D Pose Estimation6D Pose Estimation using RGB | CodeCode Available | 2 |
| AccDiffusion: An Accurate Method for Higher-Resolution Image Generation | Jul 15, 2024 | Image GenerationObject | CodeCode Available | 2 |
| OCNet: Object Context Network for Scene Parsing | Sep 4, 2018 | ObjectRelation | CodeCode Available | 2 |
| DAMSDet: Dynamic Adaptive Multispectral Detection Transformer with Competitive Query Selection and Adaptive Feature Fusion | Mar 1, 2024 | Objectobject-detection | CodeCode Available | 2 |
| Decoupling Features in Hierarchical Propagation for Video Object Segmentation | Oct 18, 2022 | ObjectSemantic Segmentation | CodeCode Available | 2 |
| Cross-View Referring Multi-Object Tracking | Dec 23, 2024 | Cross-view Referring Multi-Object TrackingMulti-Object Tracking | CodeCode Available | 2 |