Visual Prompting

Visual Prompting is the task of streamlining computer vision processes by harnessing the power of prompts, inspired by the breakthroughs of text prompting in NLP. This innovative approach involves using a few visual prompts to swiftly convert an unlabeled dataset into a deployed model, significantly reducing development time for both individual projects and enterprise solutions.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–100 of 127 papers

Title	Date	Tasks	Status	Score
Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding	Jun 9, 2023	Few-Shot Learningimage-classification	CodeCode Available	5
Exploring the Benefits of Visual Prompting in Differential Privacy	Mar 22, 2023	image-classificationImage Classification	CodeCode Available	5
UICrit: Enhancing Automated Design Evaluation with a UICritique Dataset	Jul 11, 2024	Visual Prompting	CodeCode Available	5
Stepwise Decomposition and Dual-stream Focus: A Novel Approach for Training-free Camouflaged Object Segmentation	Jun 7, 2025	Camouflaged Object SegmentationFeature Correlation	CodeCode Available	5
When Does Visual Prompting Outperform Linear Probing for Vision-Language Models? A Likelihood Perspective	Sep 3, 2024	Transfer LearningVisual Prompting	CodeCode Available	5
Unleashing the Power of Visual Prompting At the Pixel Level	Dec 20, 2022	DiversityVisual Prompting	CodeCode Available	5
Adapting Pre-trained Language Models to Vision-Language Tasks via Dynamic Visual Prompting	Jun 1, 2023	Transfer LearningVisual Prompting	CodeCode Available	5
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts	Dec 1, 2023	Visual Commonsense ReasoningVisual Prompting	CodeCode Available	5
Uncovering the Hidden Cost of Model Compression	Aug 29, 2023	modelModel Compression	CodeCode Available	5
Targeted Visual Prompting for Medical Visual Question Answering	Aug 6, 2024	Medical Visual Question AnsweringQuestion Answering	CodeCode Available	5
Towards Online Multi-Modal Social Interaction Understanding	Mar 25, 2025	Visual Prompting	CodeCode Available	5
Benchmarking Human and Automated Prompting in the Segment Anything Model	Oct 29, 2024	BenchmarkingImage Segmentation	CodeCode Available	5
VP-NTK: Exploring the Benefits of Visual Prompting in Differentially Private Data Synthesis	Mar 20, 2025	parameter-efficient fine-tuningVisual Prompting	—Unverified	0
WeatherGFM: Learning A Weather Generalist Foundation Model via In-context Learning	Nov 8, 2024	In-Context LearningQuestion Answering	—Unverified	0
Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model	Aug 1, 2024	EgoSchemaLanguage Modeling	—Unverified	0
Zoomer: Adaptive Image Focus Optimization for Black-box MLLM	Apr 30, 2025	Image CaptioningObject Recognition	—Unverified	0
3DAxiesPrompts: Unleashing the 3D Spatial Task Capabilities of GPT-4V	Dec 15, 2023	3D Object Detectionobject-detection	—Unverified	0
3DAxisPrompt: Promoting the 3D Grounding and Reasoning in GPT-4o	Mar 17, 2025	Logical ReasoningPrompt Engineering	—Unverified	0
A Comprehensive Evaluation of Multi-Modal Large Language Models for Endoscopy Analysis	May 29, 2025	DiagnosticVisual Prompting	—Unverified	0
Affordance-Guided Reinforcement Learning via Visual Prompting	Jul 14, 2024	reinforcement-learningReinforcement Learning	—Unverified	0
Analogist: Out-of-the-box Visual In-Context Learning with Image Diffusion Model	May 16, 2024	Image InpaintingIn-Context Learning	—Unverified	0
Articulate AnyMesh: Open-Vocabulary 3D Articulated Objects Modeling	Feb 4, 2025	ObjectVisual Prompting	—Unverified	0
Black-Box Visual Prompt Engineering for Mitigating Object Hallucination in Large Vision Language Models	Apr 30, 2025	HallucinationObject	—Unverified	0
BLINK: Multimodal Large Language Models Can See but Not Perceive	Apr 18, 2024	Depth EstimationMultiple-choice	—Unverified	0
Chat2Layout: Interactive 3D Furniture Layout with a Multimodal LLM	Jul 31, 2024	In-Context LearningLayout Design	—Unverified	0
Cycle-Consistency Uncertainty Estimation for Visual Prompting based One-Shot Defect Segmentation	Sep 21, 2024	Defect DetectionVisual Prompting	—Unverified	0
DegustaBot: Zero-Shot Visual Preference Estimation for Personalized Multi-Object Rearrangement	Jul 11, 2024	Object RearrangementVisual Prompting	—Unverified	0
DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models	May 29, 2025	Visual Prompting	—Unverified	0
EarthGPT-X: Enabling MLLMs to Flexibly and Comprehensively Understand Multi-Source Remote Sensing Imagery	Apr 17, 2025	Large Language ModelMulti-Task Learning	—Unverified	0
End-to-end Open-vocabulary Video Visual Relationship Detection using Multi-modal Prompting	Sep 19, 2024	DecoderObject	—Unverified	0
Explore until Confident: Efficient Exploration for Embodied Question Answering	Mar 23, 2024	Conformal PredictionEfficient Exploration	—Unverified	0
Exploring the Zero-Shot Capabilities of Vision-Language Models for Improving Gaze Following	Jun 6, 2024	In-Context LearningVisual Prompting	—Unverified	0
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms	Oct 24, 2024	DiversityLanguage Modeling	—Unverified	0
From PowerPoint UI Sketches to Web-Based Applications: Pattern-Driven Code Generation for GIS Dashboard Development Using Knowledge-Augmented LLMs, Context-Aware Visual Prompting, and the React Framework	Feb 12, 2025	Code GenerationRAG	—Unverified	0
FS-DETR: Few-Shot DEtection TRansformer with prompting and without re-training	Oct 10, 2022	Few-Shot Object Detectionobject-detection	—Unverified	0
FVP: Fourier Visual Prompting for Source-Free Unsupervised Domain Adaptation of Medical Image Segmentation	Apr 26, 2023	Domain AdaptationImage Segmentation	—Unverified	0
Grid-LOGAT: Grid Based Local and Global Area Transcription for Video Question Answering	May 30, 2025	Language ModelingLanguage Modelling	—Unverified	0
GSON: A Group-based Social Navigation Framework with Large Multimodal Model	Sep 26, 2024	Autonomous VehiclesMotion Planning	—Unverified	0
ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation	Aug 2, 2023	Image ManipulationPose Transfer	—Unverified	0
Is Temporal Prompting All We Need For Limited Labeled Action Recognition?	Apr 2, 2025	Action RecognitionAll	—Unverified	0
KUDA: Keypoints to Unify Dynamics Learning and Visual Prompting for Open-Vocabulary Robotic Manipulation	Mar 13, 2025	ObjectVisual Prompting	—Unverified	0
LaViP:Language-Grounded Visual Prompts	Dec 18, 2023	Few-Shot LearningTransfer Learning	—Unverified	0
Learning Expressive Prompting With Residuals for Vision Transformers	Mar 27, 2023	Few-Shot Learningimage-classification	—Unverified	0
Learning Visual Prompts for Guiding the Attention of Vision Transformers	Jun 5, 2024	Visual Prompting	—Unverified	0
MedFocusCLIP : Improving few shot classification in medical datasets using pixel wise attention	Jan 7, 2025	ClassificationFine-Grained Image Classification	—Unverified	0
Medical Visual Prompting (MVP): A Unified Framework for Versatile and High-Quality Medical Image Segmentation	Apr 1, 2024	Image SegmentationMedical Image Segmentation	—Unverified	0
MLLM-Search: A Zero-Shot Approach to Finding People using Multimodal Large Language Models	Nov 27, 2024	Person SearchVisual Prompting	—Unverified	0
MOKA: Open-World Robotic Manipulation through Mark-Based Visual Prompting	Mar 5, 2024	In-Context LearningObject Rearrangement	—Unverified	0
MoVL:Exploring Fusion Strategies for the Domain-Adaptive Application of Pretrained Models in Medical Imaging Tasks	May 13, 2024	image-classificationImage Classification	—Unverified	0
NVSMask3D: Hard Visual Prompting with Camera Pose Interpolation for 3D Open Vocabulary Instance Segmentation	Apr 20, 2025	3D Instance Segmentation3D Open-Vocabulary Instance Segmentation	—Unverified	0

Show:10 25 50

← PrevPage 2 of 3Next →

No leaderboard results yet.