SOTAVerified

Object

Replace the cat with a British Shorthair cat of the breed with bulging yellow eyes

Papers

Showing 401450 of 10696 papers

TitleStatusHype
M^3-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object SegmentationCode1
Multiple Object Stitching for Unsupervised Representation LearningCode1
LPOI: Listwise Preference Optimization for Vision Language ModelsCode1
Locality-Aware Zero-Shot Human-Object Interaction DetectionCode1
ReaMOT: A Benchmark and Framework for Reasoning-based Multi-Object TrackingCode1
Object-level Cross-view Geo-localization with Location Enhancement and Multi-Head Cross AttentionCode1
StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story GenerationCode1
Asynchronous Multi-Object Tracking with an Event CameraCode1
A Simple Detector with Frame Dynamics is a Strong TrackerCode1
CDFormer: Cross-Domain Few-Shot Object Detection Transformer Against Feature ConfusionCode1
LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household RoboticsCode1
GrabS: Generative Embodied Agent for 3D Object Segmentation without Scene SupervisionCode1
MonoDiff9D: Monocular Category-Level 9D Object Pose Estimation via Diffusion ModelCode1
Are We Done with Object-Centric Learning?Code1
PicoPose: Progressive Pixel-to-Pixel Correspondence Learning for Novel Object Pose EstimationCode1
v-CLR: View-Consistent Learning for Open-World Instance SegmentationCode1
DASH: Detection and Assessment of Systematic Hallucinations of VLMsCode1
EagleVision: Object-level Attribute Multimodal LLM for Remote SensingCode1
BOOTPLACE: Bootstrapped Object Placement with Detection TransformersCode1
Learning Class Prototypes for Unified Sparse Supervised 3D Object DetectionCode1
DynOPETs: A Versatile Benchmark for Dynamic Object Pose Estimation and Tracking in Moving Camera ScenariosCode1
CamSAM2: Segment Anything Accurately in Camouflaged VideosCode1
Benchmarking Object Detectors under Real-World Distribution Shifts in Satellite ImageryCode1
Global-Local Tree Search in VLMs for 3D Indoor Scene GenerationCode1
GOAL: Global-local Object Alignment LearningCode1
UltraFlwr -- An Efficient Federated Medical and Surgical Object Detection FrameworkCode1
GIVEPose: Gradual Intra-class Variation Elimination for RGB-based Category-Level Object Pose EstimationCode1
MMR: A Large-scale Benchmark Dataset for Multi-target and Multi-granularity Reasoning SegmentationCode1
Robust Object Detection of Underwater Robot based on Domain GeneralizationCode1
History-Aware Transformation of ReID Features for Multiple Object TrackingCode1
OmniSTVG: Toward Spatio-Temporal Omni-Object Video GroundingCode1
Learning to Detect Objects from Multi-Agent LiDAR Scans without Manual LabelsCode1
SimROD: A Simple Baseline for Raw Object Detection with Global and Local EnhancementsCode1
A Data-Centric Revisit of Pre-Trained Vision Models for Robot LearningCode1
DQO-MAP: Dual Quadrics Multi-Object mapping with Gaussian SplattingCode1
Convex Hull-based Algebraic Constraint for Visual Quadric SLAMCode1
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation LearningCode1
Mitigating Hallucinations in Large Vision-Language Models by Adaptively Constraining Information FlowCode1
Dynamic Markov Blanket Detection for Macroscopic Physics DiscoveryCode1
C-Drag: Chain-of-Thought Driven Motion Controller for Video GenerationCode1
CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object RepresentationCode1
Vector-Quantized Vision Foundation Models for Object-Centric LearningCode1
Ev-3DOD: Pushing the Temporal Boundaries of 3D Object Detection with Event CamerasCode1
Cross-domain Few-shot Object Detection with Multi-modal Textual EnrichmentCode1
Object-Centric Image to Video Generation with Language GuidanceCode1
DA-Mamba: Domain Adaptive Hybrid Mamba-Transformer Based One-Stage Object DetectionCode1
Knowing Your Target: Target-Aware Transformer Makes Better Spatio-Temporal Video GroundingCode1
PlaySlot: Learning Inverse Latent Dynamics for Controllable Object-Centric Video Prediction and PlanningCode1
SAVE: Self-Attention on Visual Embedding for Zero-Shot Generic Object CountingCode1
TUMTraffic-VideoQA: A Benchmark for Unified Spatio-Temporal Video Understanding in Traffic ScenesCode1
Show:102550
← PrevPage 9 of 214Next →

No leaderboard results yet.