SOTAVerified

Object Detection

Papers

Showing 201250 of 10957 papers

TitleStatusHype
RAW-Diffusion: RGB-Guided Diffusion Models for High-Fidelity RAW Image GenerationCode2
GaussianPretrain: A Simple Unified 3D Gaussian Representation for Visual Pre-training in Autonomous DrivingCode2
V2X-R: Cooperative LiDAR-4D Radar Fusion for 3D Object Detection with Denoising DiffusionCode2
Exploiting Unlabeled Data with Multiple Expert Teachers for Open Vocabulary Aerial Object Detection and Its Orientation AdaptationCode2
ImOV3D: Learning Open-Vocabulary Point Clouds 3D Object Detection from Only 2D ImagesCode2
MonoDGP: Monocular 3D Object Detection with Decoupled-Query and Geometry-Error PriorsCode2
DI-MaskDINO: A Joint Object Detection and Instance Segmentation ModelCode2
Multiview Scene GraphCode2
Open World Object Detection: A SurveyCode2
PointOBB-v2: Towards Simpler, Faster, and Stronger Single Point Supervised Oriented Object DetectionCode2
3DGS-DET: Empower 3D Gaussian Splatting with Boundary Guidance and Box-Focused Sampling for 3D Object DetectionCode2
HazyDet: Open-source Benchmark for Drone-view Object Detection with Depth-cues in Hazy ScenesCode2
DAOcc: 3D Object Detection Assisted Multi-Sensor Fusion for 3D Occupancy PredictionCode2
A Novel Unified Architecture for Low-Shot Counting by Detection and SegmentationCode2
Source-Free Domain Adaptation for YOLO Object DetectionCode2
RockTrack: A 3D Robust Multi-Camera-Ken Multi-Object Tracking FrameworkCode2
One missing piece in Vision and Language: A Survey on Comics UnderstandingCode2
UniDet3D: Multi-dataset Indoor 3D Object DetectionCode2
UTrack: Multi-Object Tracking with Uncertain DetectionsCode2
RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured EnvironmentsCode2
GOReloc: Graph-based Object-Level Relocalization for Visual SLAMCode2
Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object DetectionCode2
CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile ApplicationsCode2
L4DR: LiDAR-4DRadar Fusion for Weather-Robust 3D Object DetectionCode2
Visible-Thermal Multiple Object Tracking: Large-scale Video Dataset and Progressive Fusion ApproachCode2
Cross-Layer Feature Pyramid Transformer for Small Object Detection in Aerial ImagesCode2
MonoWAD: Weather-Adaptive Diffusion Model for Robust Monocular 3D Object DetectionCode2
COALA: A Practical and Vision-Centric Federated Learning PlatformCode2
PartGLEE: A Foundation Model for Recognizing and Parsing Any ObjectsCode2
ESOD: Efficient Small Object Detection on High-Resolution ImagesCode2
GroupMamba: Efficient Group-Based Visual State Space ModelCode2
GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook RetrievalCode2
Crowd-SAM: SAM as a Smart Annotator for Object Detection in Crowded ScenesCode2
LaMI-DETR: Open-Vocabulary Detection with Language Model InstructionCode2
OPEN: Object-wise Position Embedding for Multi-view 3D Object DetectionCode2
When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark DatasetCode2
Projecting Points to Axes: Oriented Object Detection via Point-Axis RepresentationCode2
SCSA: Exploring the Synergistic Effects Between Spatial and Channel AttentionCode2
Multi-Branch Auxiliary Fusion YOLO with Re-parameterization Heterogeneous Convolutional for accurate object detectionCode2
SH17: A Dataset for Human Safety and Personal Protective Equipment Detection in Manufacturing IndustryCode2
SegVG: Transferring Object Bounding Box to Segmentation for Visual GroundingCode2
SOOD++: Leveraging Unlabeled Data to Boost Oriented Object DetectionCode2
The Surprising Effectiveness of Multimodal Large Language Models for Video Moment RetrievalCode2
LeYOLO, New Scalable and Efficient CNN Architecture for Object DetectionCode2
Scaling Efficient Masked Image Modeling on Large Remote Sensing DatasetCode2
Voxel Mamba: Group-Free State Space Models for Point Cloud based 3D Object DetectionCode2
EFM3D: A Benchmark for Measuring Progress Towards 3D Egocentric Foundation ModelsCode2
BEVSpread: Spread Voxel Pooling for Bird's-Eye-View Representation in Vision-based Roadside 3D Object DetectionCode2
STAR: A First-Ever Dataset and A Large-Scale Benchmark for Scene Graph Generation in Large-Size Satellite ImageryCode2
EFFOcc: A Minimal Baseline for EFficient Fusion-based 3D Occupancy NetworkCode2
Show:102550
← PrevPage 5 of 220Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Co-DETRbox mAP66Unverified
2InternImage-H (M3I Pre-training)box mAP65.5Unverified
3M3I Pre-training (InternImage-H)box mAP65.4Unverified
4MoCaEbox mAP65.1Unverified
5Co-DETR (Swin-L)box mAP64.8Unverified
6Focal-Stable-DINO (Focal-Huge, no TTA)box mAP64.8Unverified
7EVAbox mAP64.7Unverified
8Group DETR v2box mAP64.5Unverified
9FocalNet-H (DINO)box mAP64.4Unverified
10InternImage-XLbox mAP64.3Unverified