SOTAVerified

Object Localization

Object Localization is the task of locating an instance of a particular object category in an image, typically by specifying a tightly cropped bounding box centered on the instance. An object proposal specifies a candidate bounding box, and an object proposal is said to be a correct localization if it sufficiently overlaps a human-labeled “ground-truth” bounding box for the given object. In the literature, the “Object Localization” task is to locate one instance of an object category, whereas “object detection” focuses on locating all instances of a category in a given image.

Source: Fast On-Line Kernel Density Estimation for Active Object Localization

Papers

Showing 151200 of 617 papers

TitleStatusHype
Rethinking the Route Towards Weakly Supervised Object LocalizationCode1
Evaluating Weakly Supervised Object Localization Methods RightCode1
Building Calibrated Deep Models via Uncertainty Matching with Auxiliary Interval PredictorsCode1
Min-max Entropy for Weakly Supervised Pointwise LocalizationCode1
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable FeaturesCode1
Learning to Augment Synthetic Images for Sim2Real Policy TransferCode1
Unsupervised Traffic Accident Detection in First-Person VideosCode1
Transfer learning for time series classificationCode1
Bounding Box Regression with Uncertainty for Accurate Object DetectionCode1
Locating Objects Without Bounding BoxesCode1
Frustum PointNets for 3D Object Detection from RGB-D DataCode1
VoxelNet: End-to-End Learning for Point Cloud Based 3D Object DetectionCode1
Grad-CAM++: Improved Visual Explanations for Deep Convolutional NetworksCode1
Mask R-CNNCode1
Learning Deep Features for Discriminative LocalizationCode1
LocNet: Improving Localization Accuracy for Object DetectionCode1
Efficient Object Localization Using Convolutional NetworksCode1
Microsoft COCO: Common Objects in ContextCode1
Mask-aware Text-to-Image Retrieval: Referring Expression Segmentation Meets Cross-modal Retrieval0
VoteSplat: Hough Voting Gaussian Splatting for 3D Scene Understanding0
RAG-6DPose: Retrieval-Augmented 6D Pose Estimation via Leveraging CAD as Knowledge Base0
CDP: Towards Robust Autoregressive Visuomotor Policy Learning via Causal Diffusion0
UAV Object Detection and Positioning in a Mining Industrial Metaverse with Custom Geo-Referenced Data0
WoMAP: World Models For Embodied Open-Vocabulary Object Localization0
Multispectral Detection Transformer with Infrared-Centric Sensor FusionCode0
Ground-V: Teaching VLMs to Ground Complex Instructions in Pixels0
Towards Omnidirectional Reasoning with 360-R1: A Dataset, Benchmark, and GRPO-based Method0
PointArena: Probing Multimodal Grounding Through Language-Guided Pointing0
Towards Accurate State Estimation: Kalman Filter Incorporating Motion Dynamics for 3D Multi-Object Tracking0
Enhancing Satellite Object Localization with Dilated Convolutions and Attention-aided Spatial PoolingCode0
Split Matching for Inductive Zero-shot Semantic Segmentation0
Pro2SAM: Mask Prompt to SAM with Grid Points for Weakly Supervised Object Localization0
Exploring Modality Guidance to Enhance VFM-based Feature Fusion for UDA in 3D Semantic Segmentation0
CFIS-YOLO: A Lightweight Multi-Scale Fusion Network for Edge-Deployable Wood Defect Detection0
Foundation Models for Remote Sensing: An Analysis of MLLMs for Object Localization0
Multi-Object Grounding via Hierarchical Contrastive Siamese Transformers0
POEM: Precise Object-level Editing via MLLM control0
Texture or Semantics? Vision-Language Models Get Lost in Font RecognitionCode0
MB-ORES: A Multi-Branch Object Reasoner for Visual Grounding in Remote SensingCode0
PixelCAM: Pixel Class Activation Mapping for Histology Image Classification and ROI LocalizationCode0
GLRD: Global-Local Collaborative Reason and Debate with PSL for 3D Open-Vocabulary Detection0
Beyond Object Categories: Multi-Attribute Reference Understanding for Visual Grounding0
xMOD: Cross-Modal Distillation for 2D/3D Multi-Object Discovery from 2D motionCode0
Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration0
MomentSeeker: A Task-Oriented Benchmark For Long-Video Moment Retrieval0
Auto-Prompting SAM for Weakly Supervised Landslide Extraction0
TeD-Loc: Text Distillation for Weakly Supervised Object LocalizationCode0
Neuromorphic Optical Tracking and Imaging of Randomly Moving Targets through Strongly Scattering Media0
AuxDepthNet: Real-Time Monocular 3D Object Detection with Depth-Sensitive Features0
Cross-Modal Distillation for 2D/3D Multi-Object Discovery from 2D Motion0
Show:102550
← PrevPage 4 of 13Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1OSMaNRGSPL32.99Unverified
2SUSARGSPL27.31Unverified
3ShanksRGSPL22.85Unverified
4CVPR22RGSPL22.06Unverified
5damm1RGSPL15.96Unverified
61637RGSPL14.03Unverified
7init. PREVALENTRGSPL13.51Unverified
8AirbertRGSPL13.28Unverified
9init. OSCARRGSPL10Unverified
10SIARGSPL9.2Unverified
#ModelMetricClaimedVerifiedStatus
1VoxelNetAP89.35Unverified
2VoxelNetAP89.35Unverified
3Frustum PointNetsAP88.7Unverified
4Frustum PointNetsAP81.2Unverified
5VoxelNetAP77.47Unverified
#ModelMetricClaimedVerifiedStatus
1Frustrum-PointPillarsAP48.3Unverified
2Frustum PointNetsAP47.2Unverified
3Frustum PointNetsAP40.23Unverified
4VoxelNetAP38.11Unverified
5VoxelNetAP31.51Unverified
#ModelMetricClaimedVerifiedStatus
1Frustrum-PointPillarsAP52.23Unverified
2Frustum PointNetsAP50.22Unverified
3Frustum PointNetsAP42.15Unverified
4VoxelNetAP40.74Unverified
5VoxelNetAP33.69Unverified
#ModelMetricClaimedVerifiedStatus
1VoxelNetAP77.39Unverified
2Frustum PointNetsAP75.33Unverified
3Frustum PointNetsAP62.19Unverified
4VoxelNetAP57.73Unverified
#ModelMetricClaimedVerifiedStatus
1Frustum PointNetsAP75.38Unverified
2Frustum PointNetsAP71.96Unverified
3VoxelNetAP66.7Unverified
4VoxelNetAP61.22Unverified
#ModelMetricClaimedVerifiedStatus
1Frustum PointNetsAP61.96Unverified
2Frustum PointNetsAP56.77Unverified
3VoxelNetAP54.76Unverified
4VoxelNetAP48.36Unverified
#ModelMetricClaimedVerifiedStatus
1Frustum PointNetsAP58.09Unverified
2Frustum PointNetsAP51.21Unverified
3VoxelNetAP46.13Unverified
4VoxelNetAP39.48Unverified
#ModelMetricClaimedVerifiedStatus
1Unified-IOXLLocalization (ablation)67Unverified
2GPV-2Localization (ablation)53.6Unverified
3Mask R-CNNLocalization (ablation)44.7Unverified
#ModelMetricClaimedVerifiedStatus
1Frustum PointNetsAP54.68Unverified
2VoxelNeAP50.55Unverified
3Frustum PointNetsAP50.39Unverified
#ModelMetricClaimedVerifiedStatus
1GPT4-Vision 4-shot+CoTAccuracy49.7Unverified
2Gemini-Pro 4-shot+CoTAccuracy33.9Unverified
#ModelMetricClaimedVerifiedStatus
1Frustum PointNetsAP84Unverified
2VoxelNetAP79.26Unverified
#ModelMetricClaimedVerifiedStatus
1Frustrum-PointPillarsAP60.98Unverified
#ModelMetricClaimedVerifiedStatus
1Hausdorff LossPrecision88.1Unverified
#ModelMetricClaimedVerifiedStatus
1oursCorLoc41.2Unverified
#ModelMetricClaimedVerifiedStatus
1oursCorLoc47.45Unverified
#ModelMetricClaimedVerifiedStatus
1Hausdorff LossF-Score88.6Unverified
#ModelMetricClaimedVerifiedStatus
1Hausdorff LossRecall89.2Unverified