| LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control | Jun 23, 2024 | Novel View SynthesisObject | —Unverified | 0 |
| MVOC: a training-free multiple video object composition method with diffusion models | Jun 22, 2024 | Image to Video GenerationObject | CodeCode Available | 1 |
| Unseen Object Reasoning with Shared Appearance Cues | Jun 21, 2024 | DiversityObject | CodeCode Available | 0 |
| Contextual Interaction via Primitive-based Adversarial Training For Compositional Zero-shot Learning | Jun 21, 2024 | AttributeCompositional Zero-Shot Learning | CodeCode Available | 0 |
| DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection | Jun 21, 2024 | Class-agnostic Object DetectionMulti-object discovery | CodeCode Available | 1 |
| GIC: Gaussian-Informed Continuum for Physical Property Identification and Simulation | Jun 21, 2024 | Object | —Unverified | 0 |
| Image Conductor: Precision Control for Interactive Video Synthesis | Jun 21, 2024 | Object | —Unverified | 0 |
| African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object Classification | Jun 20, 2024 | BenchmarkingClassification | CodeCode Available | 1 |
| LeYOLO, New Scalable and Efficient CNN Architecture for Object Detection | Jun 20, 2024 | Computational EfficiencyObject | CodeCode Available | 2 |
| Two-Stage Depth Enhanced Learning with Obstacle Map For Object Navigation | Jun 20, 2024 | NavigateObject | —Unverified | 0 |
| CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics | Jun 20, 2024 | Human-Object Interaction DetectionHumanoid Control | —Unverified | 0 |
| 3D Instance Segmentation Using Deep Learning on RGB-D Indoor Data | Jun 19, 2024 | 3D Instance Segmentation3D Object Recognition | —Unverified | 0 |
| On rough mereology and VC-dimension in treatment of decision prediction for open world decision systems | Jun 19, 2024 | Object | —Unverified | 0 |
| SMORE: Simultaneous Map and Object REconstruction | Jun 19, 2024 | Depth CompletionDynamic Reconstruction | —Unverified | 0 |
| Semantic Enhanced Few-shot Object Detection | Jun 19, 2024 | Few-Shot Object DetectionObject | —Unverified | 0 |
| AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention | Jun 18, 2024 | ObjectResponse Generation | CodeCode Available | 2 |
| Certified ML Object Detection for Surveillance Missions | Jun 18, 2024 | Objectobject-detection | —Unverified | 0 |
| GroPrompt: Efficient Grounded Prompting and Adaptation for Referring Video Object Segmentation | Jun 18, 2024 | Contrastive LearningObject | —Unverified | 0 |
| Beyond Visual Appearances: Privacy-sensitive Objects Identification via Hybrid Graph Reasoning | Jun 18, 2024 | Data AugmentationGraph Generation | —Unverified | 0 |
| Overlap Suppression Clustering for Offline Multi-Camera People Tracking | Jun 17, 2024 | ClusteringMulti-Object Tracking | —Unverified | 0 |
| Online Multi-camera People Tracking with Spatial-temporal Mechanism and Anchor-feature Hierarchical Clustering | Jun 17, 2024 | Multi-Object TrackingObject | CodeCode Available | 0 |
| Task Me Anything | Jun 17, 2024 | 2kAttribute | CodeCode Available | 2 |
| V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results | Jun 17, 2024 | Objectobject-detection | —Unverified | 0 |
| CustAny: Customizing Anything from A Single Example | Jun 17, 2024 | ObjectVirtual Try-on | CodeCode Available | 1 |
| Composing Object Relations and Attributes for Image-Text Matching | Jun 17, 2024 | AttributeGraph Attention | CodeCode Available | 1 |
| YOLO-FEDER FusionNet: A Novel Deep Learning Architecture for Drone Detection | Jun 17, 2024 | Objectobject-detection | —Unverified | 0 |
| Duoduo CLIP: Efficient 3D Understanding with Multi-View Images | Jun 17, 2024 | GPUObject | CodeCode Available | 2 |
| Syn-to-Real Unsupervised Domain Adaptation for Indoor 3D Object Detection | Jun 17, 2024 | 3D Object DetectionDomain Adaptation | CodeCode Available | 0 |
| Reminding Multimodal Large Language Models of Object-aware Knowledge with Retrieved Tags | Jun 16, 2024 | Image to textInstruction Following | —Unverified | 0 |
| SparseDet: A Simple and Effective Framework for Fully Sparse LiDAR-based 3D Object Detection | Jun 16, 2024 | 3D Object DetectionAutonomous Driving | —Unverified | 0 |
| Object Detection using Oriented Window Learning Vi-sion Transformer: Roadway Assets Recognition | Jun 15, 2024 | Autonomous DrivingObject | —Unverified | 0 |
| Object-Attribute-Relation Representation Based Video Semantic Communication | Jun 15, 2024 | AttributeObject | —Unverified | 0 |
| A Rao-Blackwellized Particle Filter for Superelliptical Extended Target Tracking | Jun 14, 2024 | Object | —Unverified | 0 |
| Details Make a Difference: Object State-Sensitive Neurorobotic Task Planning | Jun 14, 2024 | Dense CaptioningObject | CodeCode Available | 0 |
| Crafting Parts for Expressive Object Composition | Jun 14, 2024 | DenoisingImage Generation | —Unverified | 0 |
| Neural Pose Representation Learning for Generating and Transferring Non-Rigid Object Poses | Jun 14, 2024 | DisentanglementObject | —Unverified | 0 |
| NeST: Neural Stress Tensor Tomography by leveraging 3D Photoelasticity | Jun 14, 2024 | ObjectTransparent objects | —Unverified | 0 |
| Make It Count: Text-to-Image Generation with an Accurate Number of Objects | Jun 14, 2024 | DenoisingImage Generation | CodeCode Available | 2 |
| InstructRL4Pix: Training Diffusion for Image Editing by Reinforcement Learning | Jun 14, 2024 | Objectreinforcement-learning | —Unverified | 0 |
| ImageNet3D: Towards General-Purpose Object-Level 3D Understanding | Jun 13, 2024 | Image CaptioningLinear Probing Object-Level 3D Awareness | CodeCode Available | 1 |
| Adaptive Slot Attention: Object Discovery with Dynamic Slot Number | Jun 13, 2024 | DecoderObject | CodeCode Available | 0 |
| MMRel: A Relation Understanding Benchmark in the MLLM Era | Jun 13, 2024 | DiversityHallucination | CodeCode Available | 1 |
| CLIPAway: Harmonizing Focused Embeddings for Removing Objects via Diffusion Models | Jun 13, 2024 | Object | CodeCode Available | 2 |
| Neural Assets: 3D-Aware Multi-Object Scene Synthesis with Image Diffusion Models | Jun 13, 2024 | Object | —Unverified | 0 |
| STAR: A First-Ever Dataset and A Large-Scale Benchmark for Scene Graph Generation in Large-Size Satellite Imagery | Jun 13, 2024 | Graph GenerationObject | CodeCode Available | 2 |
| Enhanced Object Detection: A Study on Vast Vocabulary Object Detection Track for V3Det Challenge 2024 | Jun 13, 2024 | Objectobject-detection | —Unverified | 0 |
| 3D-AVS: LiDAR-based 3D Auto-Vocabulary Segmentation | Jun 13, 2024 | Autonomous DrivingObject | CodeCode Available | 1 |
| Interpreting the structure of multi-object representations in vision encoders | Jun 13, 2024 | Object | —Unverified | 0 |
| DiffPop: Plausibility-Guided Object Placement Diffusion for Image Composition | Jun 12, 2024 | Data AugmentationDenoising | —Unverified | 0 |
| Dataset Enhancement with Instance-Level Augmentations | Jun 12, 2024 | Data AugmentationObject | CodeCode Available | 1 |