SOTAVerified

Open Vocabulary Object Detection

Open-vocabulary detection (OVD) aims to generalize beyond the limited number of base classes labeled during the training phase. The goal is to detect novel classes defined by an unbounded (open) vocabulary at inference.

Papers

Showing 150 of 145 papers

TitleStatusHype
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language ModelCode9
YOLO-World: Real-Time Open-Vocabulary Object DetectionCode9
Visual-RFT: Visual Reinforcement Fine-TuningCode7
Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion HeadCode5
FG-CLIP: Fine-Grained Visual and Textual AlignmentCode4
Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive ReinforcementCode4
GLIPv2: Unifying Localization and Vision-Language UnderstandingCode4
Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing CommunityCode3
OVLW-DETR: Open-Vocabulary Light-Weighted Detection TransformerCode3
OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection networkCode3
Detecting Twenty-thousand Classes using Image-level SupervisionCode3
Open Vocabulary Monocular 3D Object DetectionCode2
Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary DetectionCode2
LaMI-DETR: Open-Vocabulary Detection with Language Model InstructionCode2
SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object DetectionCode2
Is CLIP the main roadblock for fine-grained open-world perception?Code2
Generative Region-Language Pretraining for Open-Ended Object DetectionCode2
YOLOv8-AM: YOLOv8 Based on Effective Attention Mechanisms for Pediatric Wrist Fracture DetectionCode2
Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object DetectorCode2
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense PredictionCode2
Detect Everything with Few ExamplesCode2
OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance SegmentationCode2
PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world LearningCode2
Bridging the Gap between Object and Image-level Representations for Open-Vocabulary DetectionCode2
Open-Vocabulary DETR with Conditional MatchingCode2
Superpowering Open-Vocabulary Object Detectors for X-ray VisionCode1
A Hierarchical Semantic Distillation Framework for Open-Vocabulary Object DetectionCode1
OW-OVD: Unified Open World and Open Vocabulary Object DetectionCode1
Comprehensive Multi-Modal Prototypes are Simple and Effective Classifiers for Vast-Vocabulary Object DetectionCode1
From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel ObjectsCode1
OVT-B: A New Large-Scale Benchmark for Open-Vocabulary Multi-Object TrackingCode1
SIA-OVD: Shape-Invariant Adapter for Bridging the Image-Region Gap in Open-Vocabulary DetectionCode1
Query3D: LLM-Powered Open-Vocabulary Scene Segmentation with Language Embedded 3D GaussianCode1
MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object DetectionCode1
DART: An Automated End-to-End Object Detection Pipeline with Data Diversification, Open-Vocabulary Bounding Box Annotation, Pseudo-Label Review, and Model TrainingCode1
OVMR: Open-Vocabulary Recognition with Multi-Modal ReferencesCode1
RTGen: Generating Region-Text Pairs for Open-Vocabulary Object DetectionCode1
OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects SupervisionCode1
The devil is in the object boundary: towards annotation-free instance segmentation using Foundation ModelsCode1
Training-free Boost for Open-Vocabulary Object Detection with Confidence AggregationCode1
Retrieval-Augmented Open-Vocabulary Object DetectionCode1
VisionGPT: LLM-Assisted Real-Time Anomaly Detection for Safe Visual NavigationCode1
GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object DetectionCode1
CLIM: Contrastive Language-Image Mosaic for Region RepresentationCode1
Simple Image-level Classification Improves Open-vocabulary Object DetectionCode1
ProxyDet: Synthesizing Proxy Novel Classes via Classwise Mixup for Open-Vocabulary Object DetectionCode1
The devil is in the fine-grained details: Evaluating open-vocabulary object detectors for fine-grained understandingCode1
Toward Open Vocabulary Aerial Object Detection with CLIP-Activated Student-Teacher LearningCode1
Enhancing Novel Object Detection via Cooperative Foundational ModelsCode1
Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and RetentionCode1
Show:102550
← PrevPage 1 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Cooperative Foundational ModelsAP 0.550.3Unverified
2DE-ViTAP 0.550Unverified
3Yolov8-nanoAP 0.547.2Unverified
4DITOAP 0.546.1Unverified
5OV-DQUO(RN50x4)AP 0.545.6Unverified
6LP-OVOD (OWL-ViT Proposals)AP 0.544.9Unverified
7CLIPSelfAP 0.544.3Unverified
8CORA+AP 0.543.1Unverified
9BARONAP 0.542.7Unverified
10SIA-OVD (RN50x4)AP 0.541.9Unverified
#ModelMetricClaimedVerifiedStatus
1LaMI-DETRAP novel-LVIS base training43.4Unverified
2DITOAP novel-LVIS base training40.4Unverified
3OV-DQUO(ViT-L/14)AP novel-LVIS base training39.3Unverified
4CoDet (EVA02-L)AP novel-LVIS base training37Unverified
5CLIPSelfAP novel-LVIS base training34.9Unverified
6OVMRAP novel-LVIS base training34.4Unverified
7DE-ViTAP novel-LVIS base training34.3Unverified
8CFM-ViTAP novel-LVIS base training33.9Unverified
9CLIM (RN50x64)AP novel-LVIS base training32.3Unverified
10RO-ViTAP novel-LVIS base training32.1Unverified
#ModelMetricClaimedVerifiedStatus
1Object-Centric-OVDmask AP5022.3Unverified
2ViLDmask AP5018.2Unverified
#ModelMetricClaimedVerifiedStatus
1Object-Centric-OVDmask AP5042.9Unverified
2Deticmask AP5042.2Unverified