| One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos | Sep 29, 2024 | AllImage Segmentation | CodeCode Available | 2 |
| 1st Place Solution to the 8th HANDS Workshop Challenge -- ARCTIC Track: 3DGS-based Bimanual Category-agnostic Interaction Reconstruction | Sep 28, 2024 | 3DGSObject | —Unverified | 0 |
| CAFF-DINO: Multi-spectral object detection transformers with cross-attention features fusion | Sep 27, 2024 | Multispectral Object DetectionObject | —Unverified | 0 |
| An Overview of Multi-Object Estimation via Labeled Random Finite Set | Sep 27, 2024 | Multi-Object TrackingObject | —Unverified | 0 |
| Improving Visual Object Tracking through Visual Prompting | Sep 27, 2024 | Object | CodeCode Available | 1 |
| A Novel Unified Architecture for Low-Shot Counting by Detection and Segmentation | Sep 27, 2024 | Exemplar-Free CountingFew-shot Object Counting and Detection | CodeCode Available | 2 |
| Query matching for spatio-temporal action detection with query-based object detector | Sep 27, 2024 | Action DetectionObject | —Unverified | 0 |
| Search3D: Hierarchical Open-Vocabulary 3D Segmentation | Sep 27, 2024 | 3D Instance Segmentation3D Part Segmentation | —Unverified | 0 |
| You Only Speak Once to See | Sep 27, 2024 | Contrastive LearningObject | —Unverified | 0 |
| Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation | Sep 26, 2024 | 6D Pose Estimation6D Pose Estimation using RGB | CodeCode Available | 1 |
| Search and Detect: Training-Free Long Tail Object Detection via Web-Image Retrieval | Sep 26, 2024 | Image RetrievalObject | —Unverified | 0 |
| Advancing Object Detection in Transportation with Multimodal Large Language Models (MLLMs): A Comprehensive Review and Empirical Testing | Sep 26, 2024 | Event DetectionObject | —Unverified | 0 |
| SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining | Sep 26, 2024 | Action RecognitionObject | —Unverified | 0 |
| Amodal Instance Segmentation with Diffusion Shape Prior Estimation | Sep 26, 2024 | Amodal Instance SegmentationInstance Segmentation | —Unverified | 0 |
| Robot See Robot Do: Imitating Articulated Object Manipulation with Monocular 4D Reconstruction | Sep 26, 2024 | 4D reconstructionObject | CodeCode Available | 2 |
| General Compression Framework for Efficient Transformer Object Tracking | Sep 26, 2024 | Model CompressionObject | —Unverified | 0 |
| Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation | Sep 26, 2024 | Image GenerationObject | CodeCode Available | 2 |
| CAMOT: Camera Angle-aware Multi-Object Tracking | Sep 26, 2024 | Multi-Object TrackingObject | —Unverified | 0 |
| Hand-object reconstruction via interaction-aware graph attention mechanism | Sep 26, 2024 | Graph AttentionGraph Neural Network | —Unverified | 0 |
| A Grasping Movement Intention Estimator for Intuitive Control of Assistive Devices | Sep 25, 2024 | Object | —Unverified | 0 |
| Transient Adversarial 3D Projection Attacks on Object Detection in Autonomous Driving | Sep 25, 2024 | Autonomous DrivingObject | —Unverified | 0 |
| Go-SLAM: Grounded Object Segmentation and Localization with Gaussian Splatting SLAM | Sep 25, 2024 | 3D Scene ReconstructionObject | —Unverified | 0 |
| Source-Free Domain Adaptation for YOLO Object Detection | Sep 25, 2024 | Domain AdaptationModel Selection | CodeCode Available | 2 |
| Progressive Representation Learning for Real-Time UAV Tracking | Sep 25, 2024 | ObjectObject Tracking | CodeCode Available | 2 |
| Generative Object Insertion in Gaussian Splatting with a Multi-View Diffusion Model | Sep 25, 2024 | 3D ReconstructionObject | CodeCode Available | 1 |
| Underwater Camouflaged Object Tracking Meets Vision-Language SAM2 | Sep 25, 2024 | ObjectObject Tracking | CodeCode Available | 5 |
| A Versatile and Differentiable Hand-Object Interaction Representation | Sep 25, 2024 | Mixed RealityObject | —Unverified | 0 |
| UICE-MIRNet guided image enhancement for underwater object detection | Sep 24, 2024 | feature selectionImage Enhancement | —Unverified | 0 |
| OW-Rep: Open World Object Detection with Instance Representation Learning | Sep 24, 2024 | Novel Class DiscoveryObject | —Unverified | 0 |
| Mind the Prompt: A Novel Benchmark for Prompt-based Class-Agnostic Counting | Sep 24, 2024 | ObjectObject Counting | CodeCode Available | 1 |
| LaPose: Laplacian Mixture Shape Modeling for RGB-Based Category-Level Object Pose Estimation | Sep 24, 2024 | ObjectPose Estimation | CodeCode Available | 1 |
| Tiny Robotics Dataset and Benchmark for Continual Object Detection | Sep 24, 2024 | Autonomous NavigationContinual Learning | CodeCode Available | 0 |
| Towards Robust Object Detection: Identifying and Removing Backdoors via Module Inconsistency Analysis | Sep 24, 2024 | backdoor defenseObject | —Unverified | 0 |
| Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking | Sep 24, 2024 | Object | —Unverified | 0 |
| DecoupleNet: A Lightweight Backbone Network With Efficient Feature Decoupling for Remote Sensing Visual Tasks | Sep 23, 2024 | ARCComputational Efficiency | CodeCode Available | 1 |
| SOS: Segment Object System for Open-World Instance Segmentation With Object Priors | Sep 22, 2024 | Instance SegmentationObject | —Unverified | 0 |
| A Bottom-Up Approach to Class-Agnostic Image Segmentation | Sep 20, 2024 | Image SegmentationMetric Learning | —Unverified | 0 |
| Formula-Supervised Visual-Geometric Pre-training | Sep 20, 2024 | 3D Object Classification3D Object Recognition | —Unverified | 0 |
| Learning to Play Video Games with Intuitive Physics Priors | Sep 20, 2024 | Decision MakingObject | —Unverified | 0 |
| Interpretable Action Recognition on Hard to Classify Actions | Sep 19, 2024 | Action RecognitionDepth Estimation | —Unverified | 0 |
| Frequency-Guided Spatial Adaptation for Camouflaged Object Detection | Sep 19, 2024 | Objectobject-detection | —Unverified | 0 |
| PoTATO: A Dataset for Analyzing Polarimetric Traces of Afloat Trash Objects | Sep 19, 2024 | Objectobject-detection | CodeCode Available | 0 |
| End-to-end Open-vocabulary Video Visual Relationship Detection using Multi-modal Prompting | Sep 19, 2024 | DecoderObject | —Unverified | 0 |
| SIM-OFE: Structure Information Mining and Object-aware Feature Enhancement for Fine-Grained Visual Categorization | Sep 18, 2024 | Fine-Grained Image ClassificationFine-Grained Visual Categorization | —Unverified | 0 |
| FAST GDRNPP: Improving the Speed of State-of-the-Art 6D Object Pose Estimation | Sep 18, 2024 | 6D Pose Estimation using RGBObject | —Unverified | 0 |
| One Map to Find Them All: Real-time Open-Vocabulary Mapping for Zero-shot Multi-Object Navigation | Sep 18, 2024 | AllObject | —Unverified | 0 |
| Towards Global Localization using Multi-Modal Object-Instance Re-Identification | Sep 18, 2024 | Camera LocalizationObject | CodeCode Available | 0 |
| End-to-End Probabilistic Geometry-Guided Regression for 6DoF Object Pose Estimation | Sep 18, 2024 | 6D Pose Estimation using RGBObject | —Unverified | 0 |
| DETECLAP: Enhancing Audio-Visual Representation Learning with Object Information | Sep 18, 2024 | ObjectRepresentation Learning | —Unverified | 0 |
| Representing Positional Information in Generative World Models for Object Manipulation | Sep 18, 2024 | Object | —Unverified | 0 |