SOTAVerified

Object

Replace the cat with a British Shorthair cat of the breed with bulging yellow eyes

Papers

Showing 401450 of 10696 papers

TitleStatusHype
Category-level Meta-learned NeRF Priors for Efficient Object Mapping0
Visual-RFT: Visual Reinforcement Fine-TuningCode7
Language-Guided Object Search in Agricultural Environments0
AI-Driven Relocation Tracking in Dynamic Kitchen EnvironmentsCode0
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation LearningCode1
EigenActor: Variant Body-Object Interaction Generation Evolved from Invariant Action Basis Reasoning0
Taming Large Multimodal Agents for Ultra-low Bitrate Semantically Disentangled Image CompressionCode0
Dynamic Markov Blanket Detection for Macroscopic Physics DiscoveryCode1
Towards Semantic 3D Hand-Object Interaction Generation via Functional Text Guidance0
Enhancing deep neural networks through complex-valued representations and Kuramoto synchronization dynamics0
Mitigating Hallucinations in Large Vision-Language Models by Adaptively Constraining Information FlowCode1
Vector-Quantized Vision Foundation Models for Object-Centric LearningCode1
Analyzing CLIP's Performance Limitations in Multi-Object Scenarios: A Controlled High-Resolution Study0
BEVDiffuser: Plug-and-Play Diffusion Model for BEV Denoising with Ground-Truth Guidance0
C-Drag: Chain-of-Thought Driven Motion Controller for Video GenerationCode1
QORT-Former: Query-optimized Real-time Transformer for Understanding Two Hands Manipulating Objects0
Vision-Encoders (Already) Know What They See: Mitigating Object Hallucination via Simple Fine-Grained CLIPScoreCode0
CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object RepresentationCode1
MITracker: Multi-View Integration for Visual Object Tracking0
InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object InteractionsCode3
CoopDETR: A Unified Cooperative Perception Framework for 3D Detection via Object Query0
Ev-3DOD: Pushing the Temporal Boundaries of 3D Object Detection with Event CamerasCode1
Dictionary-based Framework for Interpretable and Consistent Object Parsing0
ObjectVLA: End-to-End Open-World Object Manipulation Without Demonstration0
Spectral-Enhanced Transformers: Leveraging Large-Scale Pretrained Models for Hyperspectral Object Tracking0
A Distributional Treatment of Real2Sim2Real for Object-Centric Agent Adaptation in Vision-Driven Deformable Linear Object Manipulation0
Joint Reconstruction of Spatially-Coherent and Realistic Clothed Humans and Objects from a Single Image0
FetchBot: Object Fetching in Cluttered Shelves via Zero-Shot Sim2Real0
Enhancing Reusability of Learned Skills for Robot Manipulation via Gaze and Bottleneck0
SPARC: Score Prompting and Adaptive Fusion for Zero-Shot Multi-Label Recognition in Vision-Language Models0
V-HOP: Visuo-Haptic 6D Object Pose Tracking0
CRTrack: Low-Light Semi-Supervised Multi-object Tracking Based on Consistency RegularizationCode0
Cross-domain Few-shot Object Detection with Multi-modal Textual EnrichmentCode1
Geometry-Aware 3D Salient Object Detection Network0
Reasoning about Affordances: Causal and Compositional Reasoning in LLMs0
MQADet: A Plug-and-Play Paradigm for Enhancing Open-Vocabulary Object Detection via Multimodal Question Answering0
The Role of Background Information in Reducing Object Hallucination in Vision-Language Models: Insights from Cutoff API Prompting0
CrossOver: 3D Scene Cross-Modal AlignmentCode3
ODVerse33: Is the New YOLO Version Always Better? A Multi Domain benchmark from YOLO v5 to v110
Watch Less, Feel More: Sim-to-Real RL for Generalizable Articulated Object Manipulation via Motion Adaptation and Impedance Control0
RAPTOR: Refined Approach for Product Table Object Recognition0
Object-centric Binding in Contrastive Language-Image Pretraining0
MSVCOD:A Large-Scale Multi-Scene Dataset for Video Camouflage Object Detection0
Capturing Rich Behavior Representations: A Dynamic Action Semantic-Aware Graph Transformer for Video Captioning0
MEX: Memory-efficient Approach to Referring Multi-Object Tracking0
Object-Pose Estimation With Neural Population Codes0
YOLOv12: Attention-Centric Real-Time Object DetectorsCode7
CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image0
Instance-Level Moving Object Segmentation from a Single Image with Events0
RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection0
Show:102550
← PrevPage 9 of 214Next →

No leaderboard results yet.