| Deep Learning-Based Object Pose Estimation: A Comprehensive Survey | May 13, 2024 | Deep LearningObject | CodeCode Available | 3 |
| DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos | May 3, 2024 | Depth EstimationDepth Prediction | CodeCode Available | 3 |
| Moving Object Segmentation: All You Need Is SAM (and Flow) | Apr 18, 2024 | AllMotion Segmentation | CodeCode Available | 3 |
| ZeST: Zero-Shot Material Transfer from a Single Image | Apr 9, 2024 | Appearance TransferObject | CodeCode Available | 3 |
| RCBEVDet: Radar-camera Fusion in Bird's Eye View for 3D Object Detection | Mar 25, 2024 | 3D Object Detection3D Object Detection (RoI) | CodeCode Available | 3 |
| Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects | Mar 25, 2024 | Action RecognitionMotion Generation | CodeCode Available | 3 |
| Multiple Object Tracking as ID Prediction | Mar 25, 2024 | Multi-Object TrackingMultiple Object Tracking | CodeCode Available | 3 |
| MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining | Mar 20, 2024 | Aerial Scene ClassificationBuilding change detection for remote sensing images | CodeCode Available | 3 |
| ShapeLLM: Universal 3D Object Understanding for Embodied Interaction | Feb 27, 2024 | 3D geometry3D Object Captioning | CodeCode Available | 3 |
| SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM | Feb 5, 2024 | 3D Semantic SegmentationCamera Pose Estimation | CodeCode Available | 3 |
| General Object Foundation Model for Images and Videos at Scale | Dec 14, 2023 | Instance SegmentationLong-tail Video Object Segmentation | CodeCode Available | 3 |
| MotionCtrl: A Unified and Flexible Motion Controller for Video Generation | Dec 6, 2023 | ObjectVideo Generation | CodeCode Available | 3 |
| Putting the Object Back into Video Object Segmentation | Oct 19, 2023 | ObjectSegmentation | CodeCode Available | 3 |
| MagicDrive: Street View Generation with Diverse 3D Geometry Control | Oct 4, 2023 | 3D geometry3D Object Detection | CodeCode Available | 3 |
| Segment Anything Meets Point Tracking | Jul 3, 2023 | Interactive Video Object SegmentationObject | CodeCode Available | 3 |
| SAM Fails to Segment Anything? -- SAM-Adapter: Adapting SAM in Underperformed Scenes: Camouflage, Shadow, Medical Image Segmentation, and More | Apr 18, 2023 | General KnowledgeImage Segmentation | CodeCode Available | 3 |
| Geometric-aware Pretraining for Vision-centric 3D Object Detection | Apr 6, 2023 | 3D Object DetectionAutonomous Driving | CodeCode Available | 3 |
| BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects | Mar 24, 2023 | 3D Object Detection3D Object Tracking | CodeCode Available | 3 |
| Universal Instance Perception as Object Discovery and Retrieval | Mar 12, 2023 | Described Object DetectionGeneralized Referring Expression Comprehension | CodeCode Available | 3 |
| Deep OC-SORT: Multi-Pedestrian Tracking by Adaptive Re-Identification | Feb 23, 2023 | Multi-Object TrackingObject | CodeCode Available | 3 |
| OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network | Sep 10, 2022 | Continual LearningObject | CodeCode Available | 3 |
| BoT-SORT: Robust Associations Multi-Pedestrian Tracking | Jun 29, 2022 | Multi-Object TrackingObject | CodeCode Available | 3 |
| PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images | Jun 2, 2022 | 3D Lane Detection3D Object Detection | CodeCode Available | 3 |
| Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking | Mar 27, 2022 | CPUMulti-Object Tracking | CodeCode Available | 3 |
| PETR: Position Embedding Transformation for Multi-View 3D Object Detection | Mar 10, 2022 | 3D Object DetectionObject | CodeCode Available | 3 |
| DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models | Feb 8, 2022 | DiagnosticImage Captioning | CodeCode Available | 3 |
| NeROIC: Neural Rendering of Objects from Online Image Collections | Jan 7, 2022 | Neural RenderingNovel View Synthesis | CodeCode Available | 3 |
| Motion Representations for Articulated Animation | Apr 22, 2021 | ObjectVideo Reconstruction | CodeCode Available | 3 |
| Robust and Accurate Object Detection via Adversarial Learning | Mar 23, 2021 | AutoMLData Augmentation | CodeCode Available | 3 |
| A Comparative Analysis of Object Detection Metrics with a Companion Open-Source Toolkit | Jan 25, 2021 | Objectobject-detection | CodeCode Available | 3 |
| A Survey on Performance Metrics for Object-Detection Algorithms | Jul 21, 2020 | BenchmarkingObject | CodeCode Available | 3 |
| YOLOv4: Optimal Speed and Accuracy of Object Detection | Apr 23, 2020 | BIG-bench Machine LearningData Augmentation | CodeCode Available | 3 |
| First Order Motion Model for Image Animation | Feb 29, 2020 | Image Animationmodel | CodeCode Available | 3 |
| Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection | Dec 5, 2019 | Objectobject-detection | CodeCode Available | 3 |
| EfficientDet: Scalable and Efficient Object Detection | Nov 20, 2019 | AutoMLObject | CodeCode Available | 3 |
| DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World | Jun 30, 2025 | Caption GenerationObject | CodeCode Available | 2 |
| RGBTrack: Fast, Robust Depth-Free 6D Pose Estimation and Tracking | Jun 20, 2025 | 6D Pose EstimationObject | CodeCode Available | 2 |
| InstructSAM: A Training-Free Framework for Instruction-Oriented Remote Sensing Object Recognition | May 21, 2025 | Earth ObservationObject | CodeCode Available | 2 |
| Dynamic Graph Induced Contour-aware Heat Conduction Network for Event-based Object Detection | May 19, 2025 | Event-based visionObject | CodeCode Available | 2 |
| NTIRE 2025 Challenge on Cross-Domain Few-Shot Object Detection: Methods and Results | Apr 14, 2025 | Cross-Domain Few-ShotCross-Domain Few-Shot Object Detection | CodeCode Available | 2 |
| Objaverse++: Curated 3D Object Dataset with Quality Annotations | Apr 9, 2025 | 3D GenerationAttribute | CodeCode Available | 2 |
| Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting | Apr 7, 2025 | Boundary DetectionObject | CodeCode Available | 2 |
| InteractVLM: 3D Interaction Reasoning from 2D Foundational Models | Apr 7, 2025 | 3D ReconstructionObject | CodeCode Available | 2 |
| SAM2MOT: A Novel Paradigm of Multi-Object Tracking by Segmentation | Apr 6, 2025 | Multi-Object TrackingObject | CodeCode Available | 2 |
| COB-GS: Clear Object Boundaries in 3DGS Segmentation Based on Boundary-Adaptive Gaussian Splitting | Mar 25, 2025 | 3DGSObject | CodeCode Available | 2 |
| Rethinking End-to-End 2D to 3D Scene Segmentation in Gaussian Splatting | Mar 18, 2025 | Instance SegmentationObject | CodeCode Available | 2 |
| 4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models | Mar 13, 2025 | Large Language ModelObject | CodeCode Available | 2 |
| Omnidirectional Multi-Object Tracking | Mar 6, 2025 | Multi-Object TrackingObject | CodeCode Available | 2 |
| Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object Segmentation | Mar 5, 2025 | ObjectReferring Video Object Segmentation | CodeCode Available | 2 |
| Diff9D: Diffusion-Based Domain-Generalized Category-Level 9-DoF Object Pose Estimation | Feb 4, 2025 | DenoisingDomain Generalization | CodeCode Available | 2 |