SOTAVerified

Object Detection

Papers

Showing 301350 of 10957 papers

TitleStatusHype
DAMSDet: Dynamic Adaptive Multispectral Detection Transformer with Competitive Query Selection and Adaptive Feature FusionCode2
FusionVision: A comprehensive approach of 3D object reconstruction and segmentation from RGB-D cameras using YOLO and fast segment anythingCode2
DEYO: DETR with YOLO for End-to-End Object DetectionCode2
EMIFF: Enhanced Multi-scale Image Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object DetectionCode2
WeakSAM: Segment Anything Meets Weakly-supervised Instance-level RecognitionCode2
MultiCorrupt: A Multi-Modal Robustness Dataset and Benchmark of LiDAR-Camera Fusion for 3D Object DetectionCode2
YOLOv8-AM: YOLOv8 Based on Effective Attention Mechanisms for Pediatric Wrist Fracture DetectionCode2
FM-Fusion: Instance-aware Semantic Mapping Boosted by Vision-Language Foundation ModelsCode2
Ray Denoising: Depth-aware Hard Negative Sampling for Multi-view 3D Object DetectionCode2
YOLOPoint Joint Keypoint and Object DetectionCode2
HASSOD: Hierarchical Adaptive Self-Supervised Object DetectionCode2
Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object DetectorCode2
SHViT: Single-Head Vision Transformer with Memory Efficient Macro DesignCode2
MixSup: Mixed-grained Supervision for Label-efficient LiDAR-based 3D Object DetectionCode2
LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object DetectionCode2
Self-supervised Learning of LiDAR 3D Point Clouds via 2D-3D Neural CalibrationCode2
Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure AnalysisCode2
Removal then Selection: A Coarse-to-Fine Fusion Perspective for RGB-Infrared Object DetectionCode2
A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask InpaintingCode2
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space ModelCode2
Fine-Grained Prototypes Distillation for Few-Shot Object DetectionCode2
WidthFormer: Toward Efficient Transformer-based BEV View TransformationCode2
RoboFusion: Towards Robust Multi-Modal 3D Object Detection via SAMCode2
MS-DETR: Efficient DETR Training with Mixed SupervisionCode2
Exploring Orthogonality in Open World Object DetectionCode2
VkD: Improving Knowledge Distillation using Orthogonal ProjectionsCode2
Realistic Rainy Weather Simulation for LiDARs in CARLA SimulatorCode2
Agent Attention: On the Integration of Softmax and Linear AttentionCode2
Towards Automatic Power Battery Detection: New Challenge, Benchmark Dataset and BaselineCode2
Hulk: A Universal Knowledge Translator for Human-Centric TasksCode2
Aligning and Prompting Everything All at Once for Universal Visual PerceptionCode2
Segment and Caption AnythingCode2
TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion ModelsCode2
TransNeXt: Robust Foveal Visual Perception for Vision TransformersCode2
Adapter is All You Need for Tuning Visual TasksCode2
FlashOcc: Fast and Memory-Efficient Occupancy Prediction via Channel-to-Height PluginCode2
TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual RecognitionCode2
Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision TasksCode2
GenEval: An Object-Focused Framework for Evaluating Text-to-Image AlignmentCode2
UniPAD: A Universal Pre-training Paradigm for Autonomous DrivingCode2
CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object DetectionCode2
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense PredictionCode2
You Only Look at Once for Real-time and Generic Multi-TaskCode2
InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision GeneralistsCode2
Detect Everything with Few ExamplesCode2
EPTQ: Enhanced Post-Training Quantization via Hessian-guided Network-wise OptimizationCode2
RMT: Retentive Networks Meet Vision TransformersCode2
RaTrack: Moving Object Detection and Tracking with 4D Radar Point CloudCode2
DFormer: Rethinking RGBD Representation Learning for Semantic SegmentationCode2
DAT++: Spatially Dynamic Vision Transformer with Deformable AttentionCode2
Show:102550
← PrevPage 7 of 220Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Co-DETRbox mAP66Unverified
2InternImage-H (M3I Pre-training)box mAP65.5Unverified
3M3I Pre-training (InternImage-H)box mAP65.4Unverified
4MoCaEbox mAP65.1Unverified
5Co-DETR (Swin-L)box mAP64.8Unverified
6Focal-Stable-DINO (Focal-Huge, no TTA)box mAP64.8Unverified
7EVAbox mAP64.7Unverified
8Group DETR v2box mAP64.5Unverified
9FocalNet-H (DINO)box mAP64.4Unverified
10InternImage-XLbox mAP64.3Unverified