Open Vocabulary Object Detection

Open-vocabulary detection (OVD) aims to generalize beyond the limited number of base classes labeled during the training phase. The goal is to detect novel classes defined by an unbounded (open) vocabulary at inference.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 145 papers

Title	Date	Tasks	Status	Hype
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model	Apr 10, 2025	Language ModelingLanguage Modelling	CodeCode Available	9
YOLO-World: Real-Time Open-Vocabulary Object Detection	Jan 30, 2024	Instance SegmentationLanguage Modeling	CodeCode Available	9
Visual-RFT: Visual Reinforcement Fine-Tuning	Mar 3, 2025	Few-Shot Object DetectionFine-Grained Image Classification	CodeCode Available	7
Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head	Mar 11, 2024	Object DetectionOpen-vocabulary object detection	CodeCode Available	5
FG-CLIP: Fine-Grained Visual and Textual Alignment	May 8, 2025	Image-text Retrievalobject-detection	CodeCode Available	4
Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement	Mar 9, 2025	Domain GeneralizationObject Detection	CodeCode Available	4
GLIPv2: Unifying Localization and Vision-Language Understanding	Jun 12, 2022	2D Object DetectionContrastive Learning	CodeCode Available	4
Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community	Aug 17, 2024	Novel ConceptsObject	CodeCode Available	3
OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer	Jul 15, 2024	Language ModelingLanguage Modelling	CodeCode Available	3
OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network	Sep 10, 2022	Continual LearningObject	CodeCode Available	3
Detecting Twenty-thousand Classes using Image-level Supervision	Jan 7, 2022	Cross-Domain Few-Shot Object Detectionimage-classification	CodeCode Available	3
Open Vocabulary Monocular 3D Object Detection	Nov 25, 2024	3D Object DetectionMonocular 3D Object Detection	CodeCode Available	2
Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection	Sep 13, 2024	MambaOpen Vocabulary Object Detection	CodeCode Available	2
LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction	Jul 16, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection	May 16, 2024	object-detectionObject Detection	CodeCode Available	2
Is CLIP the main roadblock for fine-grained open-world perception?	Apr 4, 2024	Autonomous DrivingNovel Concepts	CodeCode Available	2
Generative Region-Language Pretraining for Open-Ended Object Detection	Mar 15, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
YOLOv8-AM: YOLOv8 Based on Effective Attention Mechanisms for Pediatric Wrist Fracture Detection	Feb 14, 2024	Fracture detectionmedical image detection	CodeCode Available	2
Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector	Feb 5, 2024	Cross-Domain Few-ShotCross-Domain Few-Shot Object Detection	CodeCode Available	2
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction	Oct 2, 2023	image-classificationImage Classification	CodeCode Available	2
Detect Everything with Few Examples	Sep 22, 2023	Binary ClassificationCross-Domain Few-Shot Object Detection	CodeCode Available	2
OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation	Sep 1, 2023	3D Open-Vocabulary Instance Segmentation3D Open-Vocabulary Object Detection	CodeCode Available	2
PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning	Nov 21, 2022	3D Classification3D Object Detection	CodeCode Available	2
Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection	Jul 7, 2022	ObjectOpen Vocabulary Attribute Detection	CodeCode Available	2
Open-Vocabulary DETR with Conditional Matching	Mar 22, 2022	Language Modellingobject-detection	CodeCode Available	2
Superpowering Open-Vocabulary Object Detectors for X-ray Vision	Mar 21, 2025	object-detectionObject Detection	CodeCode Available	1
A Hierarchical Semantic Distillation Framework for Open-Vocabulary Object Detection	Mar 13, 2025	object-detectionObject Detection	CodeCode Available	1
OW-OVD: Unified Open World and Open Vocabulary Object Detection	Jan 1, 2025	AttributeIncremental Learning	CodeCode Available	1
Comprehensive Multi-Modal Prototypes are Simple and Effective Classifiers for Vast-Vocabulary Object Detection	Dec 23, 2024	object-detectionObject Detection	CodeCode Available	1
From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects	Nov 27, 2024	Autonomous DrivingObject	CodeCode Available	1
OVT-B: A New Large-Scale Benchmark for Open-Vocabulary Multi-Object Tracking	Oct 23, 2024	Multi-Object TrackingObject	CodeCode Available	1
SIA-OVD: Shape-Invariant Adapter for Bridging the Image-Region Gap in Open-Vocabulary Detection	Oct 8, 2024	object-detectionObject Detection	CodeCode Available	1
Query3D: LLM-Powered Open-Vocabulary Scene Segmentation with Language Embedded 3D Gaussian	Aug 7, 2024	Autonomous Drivingobject-detection	CodeCode Available	1
MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection	Jul 31, 2024	Language ModellingObject	CodeCode Available	1
DART: An Automated End-to-End Object Detection Pipeline with Data Diversification, Open-Vocabulary Bounding Box Annotation, Pseudo-Label Review, and Model Training	Jul 12, 2024	Image GenerationObject	CodeCode Available	1
OVMR: Open-Vocabulary Recognition with Multi-Modal References	Jun 7, 2024	Open Vocabulary Object Detection	CodeCode Available	1
RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection	May 30, 2024	Image CaptioningImage Inpainting	CodeCode Available	1
OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision	May 28, 2024	Contrastive LearningDenoising	CodeCode Available	1
The devil is in the object boundary: towards annotation-free instance segmentation using Foundation Models	Apr 18, 2024	Instance SegmentationObject	CodeCode Available	1
Training-free Boost for Open-Vocabulary Object Detection with Confidence Aggregation	Apr 12, 2024	Objectobject-detection	CodeCode Available	1
Retrieval-Augmented Open-Vocabulary Object Detection	Apr 8, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
VisionGPT: LLM-Assisted Real-Time Anomaly Detection for Safe Visual Navigation	Mar 19, 2024	Anomaly Detectionobject-detection	CodeCode Available	1
GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection	Dec 22, 2023	Attributeobject-detection	CodeCode Available	1
CLIM: Contrastive Language-Image Mosaic for Region Representation	Dec 18, 2023	Objectobject-detection	CodeCode Available	1
Simple Image-level Classification Improves Open-vocabulary Object Detection	Dec 16, 2023	Knowledge DistillationObject	CodeCode Available	1
ProxyDet: Synthesizing Proxy Novel Classes via Classwise Mixup for Open-Vocabulary Object Detection	Dec 12, 2023	object-detectionObject Detection	CodeCode Available	1
The devil is in the fine-grained details: Evaluating open-vocabulary object detectors for fine-grained understanding	Nov 29, 2023	Objectobject-detection	CodeCode Available	1
Toward Open Vocabulary Aerial Object Detection with CLIP-Activated Student-Teacher Learning	Nov 20, 2023	Objectobject-detection	CodeCode Available	1
Enhancing Novel Object Detection via Cooperative Foundational Models	Nov 19, 2023	Novel Class DiscoveryNovel Object Detection	CodeCode Available	1
Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention	Nov 18, 2023	Concept AlignmentGraph Generation	CodeCode Available	1

Show:10 25 50

← PrevPage 1 of 3Next →

All datasets MSCOCO LVIS v1.0 Objects365 OpenImages-v4

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Cooperative Foundational Models	AP 0.5	50.3	—	Unverified
2	DE-ViT	AP 0.5	50	—	Unverified
3	Yolov8-nano	AP 0.5	47.2	—	Unverified
4	DITO	AP 0.5	46.1	—	Unverified
5	OV-DQUO(RN50x4)	AP 0.5	45.6	—	Unverified
6	LP-OVOD (OWL-ViT Proposals)	AP 0.5	44.9	—	Unverified
7	CLIPSelf	AP 0.5	44.3	—	Unverified
8	CORA+	AP 0.5	43.1	—	Unverified
9	BARON	AP 0.5	42.7	—	Unverified
10	SIA-OVD (RN50x4)	AP 0.5	41.9	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LaMI-DETR	AP novel-LVIS base training	43.4	—	Unverified
2	DITO	AP novel-LVIS base training	40.4	—	Unverified
3	OV-DQUO(ViT-L/14)	AP novel-LVIS base training	39.3	—	Unverified
4	CoDet (EVA02-L)	AP novel-LVIS base training	37	—	Unverified
5	CLIPSelf	AP novel-LVIS base training	34.9	—	Unverified
6	OVMR	AP novel-LVIS base training	34.4	—	Unverified
7	DE-ViT	AP novel-LVIS base training	34.3	—	Unverified
8	CFM-ViT	AP novel-LVIS base training	33.9	—	Unverified
9	CLIM (RN50x64)	AP novel-LVIS base training	32.3	—	Unverified
10	RO-ViT	AP novel-LVIS base training	32.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Object-Centric-OVD	mask AP50	22.3	—	Unverified
2	ViLD	mask AP50	18.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Object-Centric-OVD	mask AP50	42.9	—	Unverified
2	Detic	mask AP50	42.2	—	Unverified