UniFa: A unified feature hallucination framework for any-shot object detection

2025-03-01journal 2025Unverified0· sign in to hype

Hui Nie, Ruiping Wang, Xilin Chen

Unverified — Be the first to reproduce this paper.

Abstract

Any-shot object detection seeks to simultaneously detect base (many-shot), few-shot and zero-shot categories. The primary challenge lies in insufficient visual data for rare (few-shot and zero-shot) categories, hindering effective training. Existing methods use visual feature generation to alleviate it, but the quality of the generated features is low and limited to zero-shot object detection task (i.e., only including zero-shot categories). This mainly arises from semantic information for feature generation trained on unimodal data lacking visual-awareness, and the significant distinctness of generated features across categories. To tackle these issues, we introduce the Unified Feature Hallucination (UniFa) framework, which generates high-quality features for two rare categories. Utilizing CLIP’s text encoder, we transform category names into visual-aware semantic information for generating visual features, facilitating better visual-semantic alignment. A semantically blended feature enhancer is utilized to merge features from any two categories, producing denser and more realistic features. The effectiveness of our approach is confirmed through extensive experiments on MSCOCO datasets.

Tasks

Generalized Zero-Shot Object Detection Hallucination object-detection Object Detection Zero-Shot Object Detection

UniFa: A unified feature hallucination framework for any-shot object detection

Abstract

Tasks

Reproductions