SOTAVerified

Visual Prompting

Visual Prompting is the task of streamlining computer vision processes by harnessing the power of prompts, inspired by the breakthroughs of text prompting in NLP. This innovative approach involves using a few visual prompts to swiftly convert an unlabeled dataset into a deployed model, significantly reducing development time for both individual projects and enterprise solutions.

Papers

Showing 125 of 127 papers

TitleStatusHype
Stepwise Decomposition and Dual-stream Focus: A Novel Approach for Training-free Camouflaged Object SegmentationCode0
RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought0
Grid-LOGAT: Grid Based Local and Global Area Transcription for Video Question Answering0
DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models0
A Comprehensive Evaluation of Multi-Modal Large Language Models for Endoscopy Analysis0
VP Lab: a PEFT-Enabled Visual Prompting Laboratory for Semantic Segmentation0
Vision Graph Prompting via Semantic Low-Rank DecompositionCode1
Token Coordinated Prompt Attention is Needed for Visual PromptingCode1
Zoomer: Adaptive Image Focus Optimization for Black-box MLLM0
Black-Box Visual Prompt Engineering for Mitigating Object Hallucination in Large Vision Language Models0
RadSAM: Segmenting 3D radiological images with a 2D promptable model0
Visual and textual prompts for enhancing emotion recognition in video0
NVSMask3D: Hard Visual Prompting with Camera Pose Interpolation for 3D Open Vocabulary Instance Segmentation0
Visual Prompting for One-shot Controllable Video Editing without Inversion0
EarthGPT-X: Enabling MLLMs to Flexibly and Comprehensively Understand Multi-Source Remote Sensing Imagery0
Prompt-Guided Attention Head Selection for Focus-Oriented Image Retrieval0
Is Temporal Prompting All We Need For Limited Labeled Action Recognition?0
Towards Online Multi-Modal Social Interaction UnderstandingCode0
VP-NTK: Exploring the Benefits of Visual Prompting in Differentially Private Data Synthesis0
3DAxisPrompt: Promoting the 3D Grounding and Reasoning in GPT-4o0
KUDA: Keypoints to Unify Dynamics Learning and Visual Prompting for Open-Vocabulary Robotic Manipulation0
Chameleon: Fast-slow Neuro-symbolic Lane Topology ExtractionCode2
Towards Universal Text-driven CT Image SegmentationCode0
Towards Ambiguity-Free Spatial Foundation Model: Rethinking and Decoupling Depth AmbiguityCode0
The Role of Background Information in Reducing Object Hallucination in Vision-Language Models: Insights from Cutoff API Prompting0
Show:102550
← PrevPage 1 of 6Next →

No leaderboard results yet.