| ESOD: Efficient Small Object Detection on High-Resolution Images | Jul 23, 2024 | GPUObject | CodeCode Available | 2 |
| AccDiffusion: An Accurate Method for Higher-Resolution Image Generation | Jul 15, 2024 | Image GenerationObject | CodeCode Available | 2 |
| OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection | Jul 15, 2024 | 3D Object DetectionDepth Estimation | CodeCode Available | 2 |
| InstructLayout: Instruction-Driven 2D and 3D Layout Synthesis with Semantic Graph Prior | Jul 10, 2024 | BenchmarkingDecoder | CodeCode Available | 2 |
| SOOD++: Leveraging Unlabeled Data to Boost Oriented Object Detection | Jul 1, 2024 | Objectobject-detection | CodeCode Available | 2 |
| SeFlow: A Self-Supervised Scene Flow Method in Autonomous Driving | Jul 1, 2024 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 2 |
| CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement | Jun 27, 2024 | Human-Object Interaction DetectionHuman-Object Interaction Generation | CodeCode Available | 2 |
| MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning | Jun 25, 2024 | ObjectObject Recognition | CodeCode Available | 2 |
| LeYOLO, New Scalable and Efficient CNN Architecture for Object Detection | Jun 20, 2024 | Computational EfficiencyObject | CodeCode Available | 2 |
| AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention | Jun 18, 2024 | ObjectResponse Generation | CodeCode Available | 2 |
| Task Me Anything | Jun 17, 2024 | 2kAttribute | CodeCode Available | 2 |
| Duoduo CLIP: Efficient 3D Understanding with Multi-View Images | Jun 17, 2024 | GPUObject | CodeCode Available | 2 |
| Make It Count: Text-to-Image Generation with an Accurate Number of Objects | Jun 14, 2024 | DenoisingImage Generation | CodeCode Available | 2 |
| CLIPAway: Harmonizing Focused Embeddings for Removing Objects via Diffusion Models | Jun 13, 2024 | Object | CodeCode Available | 2 |
| STAR: A First-Ever Dataset and A Large-Scale Benchmark for Scene Graph Generation in Large-Size Satellite Imagery | Jun 13, 2024 | Graph GenerationObject | CodeCode Available | 2 |
| Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models | Jun 12, 2024 | Audio captioningHallucination | CodeCode Available | 2 |
| REPARO: Compositional 3D Assets Generation with Differentiable 3D Layout Alignment | May 28, 2024 | Image to 3DObject | CodeCode Available | 2 |
| VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models | May 27, 2024 | Object | CodeCode Available | 2 |
| REACT: Real-time Efficiency and Accuracy Compromise for Tradeoffs in Scene Graph Generation | May 25, 2024 | Graph GenerationObject | CodeCode Available | 2 |
| SpecDETR: A Transformer-based Hyperspectral Point Object Detection Network | May 16, 2024 | Binary ClassificationDecoder | CodeCode Available | 2 |
| UnSAMFlow: Unsupervised Optical Flow Guided by Segment Anything Model | May 4, 2024 | ObjectOptical Flow Estimation | CodeCode Available | 2 |
| DAVE -- A Detect-and-Verify Paradigm for Low-Shot Counting | Apr 25, 2024 | Exemplar-Free CountingFew-shot Object Counting and Detection | CodeCode Available | 2 |
| CFMW: Cross-modality Fusion Mamba for Multispectral Object Detection under Adverse Weather Conditions | Apr 25, 2024 | MambaMultispectral Object Detection | CodeCode Available | 2 |
| Commonsense Prototype for Outdoor Unsupervised 3D Object Detection | Apr 25, 2024 | 3D Object DetectionObject | CodeCode Available | 2 |
| X-Ray: A Sequential 3D Representation For Generation | Apr 22, 2024 | 3D GenerationObject | CodeCode Available | 2 |
| Augmented Object Intelligence with XR-Objects | Apr 20, 2024 | ObjectSemantic Segmentation | CodeCode Available | 2 |
| MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model | Apr 19, 2024 | ObjectSemantic Segmentation | CodeCode Available | 2 |
| Salient Object-Aware Background Generation using Text-Guided Diffusion Models | Apr 15, 2024 | Object | CodeCode Available | 2 |
| SFSORT: Scene Features-based Simple Online Real-Time Tracker | Apr 11, 2024 | CPUMulti-Object Tracking | CodeCode Available | 2 |
| Learning Embeddings with Centroid Triplet Loss for Object Identification in Robotic Grasping | Apr 9, 2024 | Image RetrievalObject | CodeCode Available | 2 |
| YOLC: You Only Look Clusters for Tiny Object Detection in Aerial Images | Apr 9, 2024 | Objectobject-detection | CodeCode Available | 2 |
| Joint Reconstruction of 3D Human and Object via Contact-Based Refinement Transformer | Apr 7, 2024 | 3D Human Reconstruction3D Object Reconstruction | CodeCode Available | 2 |
| Is CLIP the main roadblock for fine-grained open-world perception? | Apr 4, 2024 | Autonomous DrivingNovel Concepts | CodeCode Available | 2 |
| MonoCD: Monocular 3D Object Detection with Complementary Depths | Apr 4, 2024 | 3D Object DetectionDepth Estimation | CodeCode Available | 2 |
| DQ-DETR: DETR with Dynamic Query for Tiny Object Detection | Apr 4, 2024 | Objectobject-detection | CodeCode Available | 2 |
| EGTR: Extracting Graph from Transformer for Scene Graph Generation | Apr 2, 2024 | Graph GenerationMulti-Task Learning | CodeCode Available | 2 |
| Scene Adaptive Sparse Transformer for Event-based Object Detection | Apr 2, 2024 | Objectobject-detection | CodeCode Available | 2 |
| Text2HOI: Text-guided 3D Motion Generation for Hand-Object Interaction | Mar 31, 2024 | Motion GenerationObject | CodeCode Available | 2 |
| DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries | Mar 29, 2024 | ObjectVideo Instance Segmentation | CodeCode Available | 2 |
| Total-Decom: Decomposed 3D Scene Reconstruction with Minimal Interaction | Mar 28, 2024 | 3D geometry3D Reconstruction | CodeCode Available | 2 |
| Efficient Image Pre-Training with Siamese Cropped Masked Autoencoders | Mar 26, 2024 | ObjectSelf-Supervised Learning | CodeCode Available | 2 |
| Efficient Video Object Segmentation via Modulated Cross-Attention Memory | Mar 26, 2024 | GPUObject | CodeCode Available | 2 |
| EgoLifter: Open-world 3D Segmentation for Egocentric Perception | Mar 26, 2024 | 3D ReconstructionObject | CodeCode Available | 2 |
| Elysium: Exploring Object-level Perception in Videos via MLLM | Mar 25, 2024 | ObjectObject Tracking | CodeCode Available | 2 |
| InterFusion: Text-Driven Generation of 3D Human-Object Interaction | Mar 22, 2024 | 3D Generationglobal-optimization | CodeCode Available | 2 |
| CPA-Enhancer: Chain-of-Thought Prompted Adaptive Enhancer for Object Detection under Unknown Degradations | Mar 17, 2024 | Objectobject-detection | CodeCode Available | 2 |
| NetTrack: Tracking Highly Dynamic Objects with a Net | Mar 17, 2024 | Multi-Object TrackingObject | CodeCode Available | 2 |
| Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification | Mar 15, 2024 | Object | CodeCode Available | 2 |
| Generative Region-Language Pretraining for Open-Ended Object Detection | Mar 15, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Knowledge Distillation in YOLOX-ViT for Side-Scan Sonar Object Detection | Mar 14, 2024 | Knowledge DistillationNovel Object Detection | CodeCode Available | 2 |