SOTAVerified

Object Localization

Object Localization is the task of locating an instance of a particular object category in an image, typically by specifying a tightly cropped bounding box centered on the instance. An object proposal specifies a candidate bounding box, and an object proposal is said to be a correct localization if it sufficiently overlaps a human-labeled “ground-truth” bounding box for the given object. In the literature, the “Object Localization” task is to locate one instance of an object category, whereas “object detection” focuses on locating all instances of a category in a given image.

Source: Fast On-Line Kernel Density Estimation for Active Object Localization

Papers

Showing 150 of 617 papers

TitleStatusHype
Qwen2.5-VL Technical ReportCode11
Bilateral Reference for High-Resolution Dichotomous Image SegmentationCode7
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language TasksCode5
Mamba-FETrack: Frame-Event Tracking via State Space ModelCode4
The All-Seeing Project V2: Towards General Relation Comprehension of the Open WorldCode4
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language ModelsCode4
LangSplat: 3D Language Gaussian SplattingCode3
DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile ManipulationCode3
CrossOver: 3D Scene Cross-Modal AlignmentCode3
Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3DCode3
CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object DetectionCode2
Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic SegmentationCode2
Omnidirectional Multi-Object TrackingCode2
Crafting Better Contrastive Views for Siamese Representation LearningCode2
BOP Challenge 2020 on 6D Object LocalizationCode2
C2AM: Contrastive Learning of Class-Agnostic Activation Map for Weakly Supervised Object Localization and Semantic SegmentationCode2
Point Segment and Count: A Generalized Framework for Object CountingCode2
Removal then Selection: A Coarse-to-Fine Fusion Perspective for RGB-Infrared Object DetectionCode2
A Novel Unified Architecture for Low-Shot Counting by Detection and SegmentationCode2
Deep Snake for Real-Time Instance SegmentationCode2
Kimera: from SLAM to Spatial Perception with 3D Dynamic Scene GraphsCode2
Many-Shot In-Context Learning in Multimodal Foundation ModelsCode2
Roboflow 100: A Rich, Multi-Domain Object Detection BenchmarkCode2
Context-Aware Entity Grounding with Open-Vocabulary 3D Scene GraphsCode1
Context-Aware 3D Object Localization from Single Calibrated Images: A Study of BasketballsCode1
Dual Attention U-Net with Feature Infusion: Pushing the Boundaries of Multiclass Defect SegmentationCode1
Dual Progressive Transformations for Weakly Supervised Semantic SegmentationCode1
Dual-attention Guided Dropblock Module for Weakly Supervised Object LocalizationCode1
An Attention-guided Multistream Feature Fusion Network for Localization of Risky Objects in Driving VideosCode1
Anchor-free Small-scale Multispectral Pedestrian DetectionCode1
DETReg: Unsupervised Pretraining with Region Priors for Object DetectionCode1
Discriminative Sounding Objects Localization via Self-supervised Audiovisual MatchingCode1
CLIP the Gap: A Single Domain Generalization Approach for Object DetectionCode1
Background Activation Suppression for Weakly Supervised Object LocalizationCode1
A Low-Shot Object Counting Network With Iterative Prototype AdaptationCode1
CoWs on Pasture: Baselines and Benchmarks for Language-Driven Zero-Shot Object NavigationCode1
Distilling Knowledge from Refinement in Multiple Instance Detection NetworksCode1
DAFNe: A One-Stage Anchor-Free Approach for Oriented Object DetectionCode1
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable FeaturesCode1
DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse MotionCode1
Audio-Visual Grouping Network for Sound Localization from MixturesCode1
Class-aware Sounding Objects Localization via Audiovisual CorrespondenceCode1
Cross-Modal Weighting Network for RGB-D Salient Object DetectionCode1
CIA-SSD: Confident IoU-Aware Single-Stage Object Detector From Point CloudCode1
Cascade-DETR: Delving into High-Quality Universal Object DetectionCode1
CAM Back Again: Large Kernel CNNs from a Weakly Supervised Object Localization PerspectiveCode1
Agent Journey Beyond RGB: Unveiling Hybrid Semantic-Spatial Environmental Representations for Vision-and-Language NavigationCode1
Building Calibrated Deep Models via Uncertainty Matching with Auxiliary Interval PredictorsCode1
Background Activation Suppression for Weakly Supervised Object Localization and Semantic SegmentationCode1
DeepCut: Unsupervised Segmentation using Graph Neural Networks ClusteringCode1
Show:102550
← PrevPage 1 of 13Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1OSMaNRGSPL32.99Unverified
2SUSARGSPL27.31Unverified
3ShanksRGSPL22.85Unverified
4CVPR22RGSPL22.06Unverified
5damm1RGSPL15.96Unverified
61637RGSPL14.03Unverified
7init. PREVALENTRGSPL13.51Unverified
8AirbertRGSPL13.28Unverified
9init. OSCARRGSPL10Unverified
10SIARGSPL9.2Unverified
#ModelMetricClaimedVerifiedStatus
1VoxelNetAP89.35Unverified
2VoxelNetAP89.35Unverified
3Frustum PointNetsAP88.7Unverified
4Frustum PointNetsAP81.2Unverified
5VoxelNetAP77.47Unverified
#ModelMetricClaimedVerifiedStatus
1Frustrum-PointPillarsAP48.3Unverified
2Frustum PointNetsAP47.2Unverified
3Frustum PointNetsAP40.23Unverified
4VoxelNetAP38.11Unverified
5VoxelNetAP31.51Unverified
#ModelMetricClaimedVerifiedStatus
1Frustrum-PointPillarsAP52.23Unverified
2Frustum PointNetsAP50.22Unverified
3Frustum PointNetsAP42.15Unverified
4VoxelNetAP40.74Unverified
5VoxelNetAP33.69Unverified
#ModelMetricClaimedVerifiedStatus
1VoxelNetAP77.39Unverified
2Frustum PointNetsAP75.33Unverified
3Frustum PointNetsAP62.19Unverified
4VoxelNetAP57.73Unverified
#ModelMetricClaimedVerifiedStatus
1Frustum PointNetsAP75.38Unverified
2Frustum PointNetsAP71.96Unverified
3VoxelNetAP66.7Unverified
4VoxelNetAP61.22Unverified
#ModelMetricClaimedVerifiedStatus
1Frustum PointNetsAP61.96Unverified
2Frustum PointNetsAP56.77Unverified
3VoxelNetAP54.76Unverified
4VoxelNetAP48.36Unverified
#ModelMetricClaimedVerifiedStatus
1Frustum PointNetsAP58.09Unverified
2Frustum PointNetsAP51.21Unverified
3VoxelNetAP46.13Unverified
4VoxelNetAP39.48Unverified
#ModelMetricClaimedVerifiedStatus
1Unified-IOXLLocalization (ablation)67Unverified
2GPV-2Localization (ablation)53.6Unverified
3Mask R-CNNLocalization (ablation)44.7Unverified
#ModelMetricClaimedVerifiedStatus
1Frustum PointNetsAP54.68Unverified
2VoxelNeAP50.55Unverified
3Frustum PointNetsAP50.39Unverified
#ModelMetricClaimedVerifiedStatus
1GPT4-Vision 4-shot+CoTAccuracy49.7Unverified
2Gemini-Pro 4-shot+CoTAccuracy33.9Unverified
#ModelMetricClaimedVerifiedStatus
1Frustum PointNetsAP84Unverified
2VoxelNetAP79.26Unverified
#ModelMetricClaimedVerifiedStatus
1Frustrum-PointPillarsAP60.98Unverified
#ModelMetricClaimedVerifiedStatus
1Hausdorff LossPrecision88.1Unverified
#ModelMetricClaimedVerifiedStatus
1oursCorLoc41.2Unverified
#ModelMetricClaimedVerifiedStatus
1oursCorLoc47.45Unverified
#ModelMetricClaimedVerifiedStatus
1Hausdorff LossF-Score88.6Unverified
#ModelMetricClaimedVerifiedStatus
1Hausdorff LossRecall89.2Unverified