Visual Prompting

Visual Prompting is the task of streamlining computer vision processes by harnessing the power of prompts, inspired by the breakthroughs of text prompting in NLP. This innovative approach involves using a few visual prompts to swiftly convert an unlabeled dataset into a deployed model, significantly reducing development time for both individual projects and enterprise solutions.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 127 papers

Title	Date	Tasks	Status	Hype
Stepwise Decomposition and Dual-stream Focus: A Novel Approach for Training-free Camouflaged Object Segmentation	Jun 7, 2025	Camouflaged Object SegmentationFeature Correlation	CodeCode Available	0
RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought	Jun 4, 2025	Multimodal ReasoningReasoning Segmentation	—Unverified	0
Grid-LOGAT: Grid Based Local and Global Area Transcription for Video Question Answering	May 30, 2025	Language ModelingLanguage Modelling	—Unverified	0
DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models	May 29, 2025	Visual Prompting	—Unverified	0
A Comprehensive Evaluation of Multi-Modal Large Language Models for Endoscopy Analysis	May 29, 2025	DiagnosticVisual Prompting	—Unverified	0
VP Lab: a PEFT-Enabled Visual Prompting Laboratory for Semantic Segmentation	May 21, 2025	parameter-efficient fine-tuningSemantic Segmentation	—Unverified	0
Vision Graph Prompting via Semantic Low-Rank Decomposition	May 7, 2025	parameter-efficient fine-tuningVisual Prompting	CodeCode Available	1
Token Coordinated Prompt Attention is Needed for Visual Prompting	May 5, 2025	DiversityVisual Prompting	CodeCode Available	1
Zoomer: Adaptive Image Focus Optimization for Black-box MLLM	Apr 30, 2025	Image CaptioningObject Recognition	—Unverified	0
Black-Box Visual Prompt Engineering for Mitigating Object Hallucination in Large Vision Language Models	Apr 30, 2025	HallucinationObject	—Unverified	0
RadSAM: Segmenting 3D radiological images with a 2D promptable model	Apr 29, 2025	Image SegmentationMedical Image Segmentation	—Unverified	0
Visual and textual prompts for enhancing emotion recognition in video	Apr 24, 2025	Emotion RecognitionVideo Emotion Recognition	—Unverified	0
NVSMask3D: Hard Visual Prompting with Camera Pose Interpolation for 3D Open Vocabulary Instance Segmentation	Apr 20, 2025	3D Instance Segmentation3D Open-Vocabulary Instance Segmentation	—Unverified	0
Visual Prompting for One-shot Controllable Video Editing without Inversion	Apr 19, 2025	Video EditingVisual Prompting	—Unverified	0
EarthGPT-X: Enabling MLLMs to Flexibly and Comprehensively Understand Multi-Source Remote Sensing Imagery	Apr 17, 2025	Large Language ModelMulti-Task Learning	—Unverified	0
Prompt-Guided Attention Head Selection for Focus-Oriented Image Retrieval	Apr 2, 2025	Image RetrievalRetrieval	—Unverified	0
Is Temporal Prompting All We Need For Limited Labeled Action Recognition?	Apr 2, 2025	Action RecognitionAll	—Unverified	0
Towards Online Multi-Modal Social Interaction Understanding	Mar 25, 2025	Visual Prompting	CodeCode Available	0
VP-NTK: Exploring the Benefits of Visual Prompting in Differentially Private Data Synthesis	Mar 20, 2025	parameter-efficient fine-tuningVisual Prompting	—Unverified	0
3DAxisPrompt: Promoting the 3D Grounding and Reasoning in GPT-4o	Mar 17, 2025	Logical ReasoningPrompt Engineering	—Unverified	0
KUDA: Keypoints to Unify Dynamics Learning and Visual Prompting for Open-Vocabulary Robotic Manipulation	Mar 13, 2025	ObjectVisual Prompting	—Unverified	0
Chameleon: Fast-slow Neuro-symbolic Lane Topology Extraction	Mar 10, 2025	Autonomous DrivingScene Understanding	CodeCode Available	2
Towards Universal Text-driven CT Image Segmentation	Mar 8, 2025	Computed Tomography (CT)Contrastive Learning	CodeCode Available	0
Towards Ambiguity-Free Spatial Foundation Model: Rethinking and Decoupling Depth Ambiguity	Mar 8, 2025	Depth EstimationScene Understanding	CodeCode Available	0
The Role of Background Information in Reducing Object Hallucination in Vision-Language Models: Insights from Cutoff API Prompting	Feb 21, 2025	HallucinationObject	—Unverified	0
From PowerPoint UI Sketches to Web-Based Applications: Pattern-Driven Code Generation for GIS Dashboard Development Using Knowledge-Augmented LLMs, Context-Aware Visual Prompting, and the React Framework	Feb 12, 2025	Code GenerationRAG	—Unverified	0
Personalization Toolkit: Training Free Personalization of Large Vision Language Models	Feb 4, 2025	RAGRetrieval	—Unverified	0
Articulate AnyMesh: Open-Vocabulary 3D Articulated Objects Modeling	Feb 4, 2025	ObjectVisual Prompting	—Unverified	0
LoR-VP: Low-Rank Visual Prompting for Efficient Vision Model Adaptation	Feb 2, 2025	Inductive BiasVisual Prompting	CodeCode Available	1
IP-Prompter: Training-Free Theme-Specific Image Generation via Dynamic Visual Prompting	Jan 26, 2025	Diffusion PersonalizationDiffusion Personalization Tuning Free	CodeCode Available	0
MedFocusCLIP : Improving few shot classification in medical datasets using pixel wise attention	Jan 7, 2025	ClassificationFine-Grained Image Classification	—Unverified	0
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models	Jan 2, 2025	Scene Understandingtext annotation	CodeCode Available	4
Query Efficient Black-Box Visual Prompting with Subspace Learning	Jan 1, 2025	Prompt LearningVisual Prompting	—Unverified	0
Visual Prompting with Iterative Refinement for Design Critique Generation	Dec 22, 2024	AttributeVisual Prompting	—Unverified	0
Selective Visual Prompting in Vision Mamba	Dec 12, 2024	MambaState Space Models	CodeCode Available	1
Test-time Correction with Human Feedback: An Online 3D Detection System via Visual Prompting	Dec 10, 2024	Autonomous DrivingVisual Prompting	—Unverified	0
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning	Dec 4, 2024	Multimodal Large Language ModelVideo Understanding	CodeCode Available	1
MLLM-Search: A Zero-Shot Approach to Finding People using Multimodal Large Language Models	Nov 27, 2024	Person SearchVisual Prompting	—Unverified	0
Improved GUI Grounding via Iterative Narrowing	Nov 18, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
Prompting the Unseen: Detecting Hidden Backdoors in Black-Box Models	Nov 14, 2024	Visual Prompting	—Unverified	0
WeatherGFM: Learning A Weather Generalist Foundation Model via In-context Learning	Nov 8, 2024	In-Context LearningQuestion Answering	—Unverified	0
Benchmarking Human and Automated Prompting in the Segment Anything Model	Oct 29, 2024	BenchmarkingImage Segmentation	CodeCode Available	0
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms	Oct 24, 2024	DiversityLanguage Modeling	—Unverified	0
Visual Prompting in LLMs for Enhancing Emotion Recognition	Oct 3, 2024	Emotion RecognitionVisual Prompting	—Unverified	0
Improving Visual Object Tracking through Visual Prompting	Sep 27, 2024	Object	CodeCode Available	1
GSON: A Group-based Social Navigation Framework with Large Multimodal Model	Sep 26, 2024	Autonomous VehiclesMotion Planning	—Unverified	0
Attention Prompting on Image for Large Vision-Language Models	Sep 25, 2024	MM-VetVisual Prompting	CodeCode Available	2
Cycle-Consistency Uncertainty Estimation for Visual Prompting based One-Shot Defect Segmentation	Sep 21, 2024	Defect DetectionVisual Prompting	—Unverified	0
End-to-end Open-vocabulary Video Visual Relationship Detection using Multi-modal Prompting	Sep 19, 2024	DecoderObject	—Unverified	0
Visual Prompting in Multimodal Large Language Models: A Survey	Sep 5, 2024	In-Context LearningPrompt Learning	—Unverified	0

Show:10 25 50

← PrevPage 1 of 3Next →

No leaderboard results yet.