Visual Prompting

Visual Prompting is the task of streamlining computer vision processes by harnessing the power of prompts, inspired by the breakthroughs of text prompting in NLP. This innovative approach involves using a few visual prompts to swiftly convert an unlabeled dataset into a deployed model, significantly reducing development time for both individual projects and enterprise solutions.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–100 of 127 papers

Title	Date	Tasks	Status
DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models	May 29, 2025	Visual Prompting	—Unverified
VP Lab: a PEFT-Enabled Visual Prompting Laboratory for Semantic Segmentation	May 21, 2025	parameter-efficient fine-tuningSemantic Segmentation	—Unverified
Zoomer: Adaptive Image Focus Optimization for Black-box MLLM	Apr 30, 2025	Image CaptioningObject Recognition	—Unverified
Black-Box Visual Prompt Engineering for Mitigating Object Hallucination in Large Vision Language Models	Apr 30, 2025	HallucinationObject	—Unverified
RadSAM: Segmenting 3D radiological images with a 2D promptable model	Apr 29, 2025	Image SegmentationMedical Image Segmentation	—Unverified
Visual and textual prompts for enhancing emotion recognition in video	Apr 24, 2025	Emotion RecognitionVideo Emotion Recognition	—Unverified
NVSMask3D: Hard Visual Prompting with Camera Pose Interpolation for 3D Open Vocabulary Instance Segmentation	Apr 20, 2025	3D Instance Segmentation3D Open-Vocabulary Instance Segmentation	—Unverified
Visual Prompting for One-shot Controllable Video Editing without Inversion	Apr 19, 2025	Video EditingVisual Prompting	—Unverified
EarthGPT-X: Enabling MLLMs to Flexibly and Comprehensively Understand Multi-Source Remote Sensing Imagery	Apr 17, 2025	Large Language ModelMulti-Task Learning	—Unverified
Prompt-Guided Attention Head Selection for Focus-Oriented Image Retrieval	Apr 2, 2025	Image RetrievalRetrieval	—Unverified
Is Temporal Prompting All We Need For Limited Labeled Action Recognition?	Apr 2, 2025	Action RecognitionAll	—Unverified
Towards Online Multi-Modal Social Interaction Understanding	Mar 25, 2025	Visual Prompting	CodeCode Available
VP-NTK: Exploring the Benefits of Visual Prompting in Differentially Private Data Synthesis	Mar 20, 2025	parameter-efficient fine-tuningVisual Prompting	—Unverified
3DAxisPrompt: Promoting the 3D Grounding and Reasoning in GPT-4o	Mar 17, 2025	Logical ReasoningPrompt Engineering	—Unverified
KUDA: Keypoints to Unify Dynamics Learning and Visual Prompting for Open-Vocabulary Robotic Manipulation	Mar 13, 2025	ObjectVisual Prompting	—Unverified
Towards Ambiguity-Free Spatial Foundation Model: Rethinking and Decoupling Depth Ambiguity	Mar 8, 2025	Depth EstimationScene Understanding	CodeCode Available
Towards Universal Text-driven CT Image Segmentation	Mar 8, 2025	Computed Tomography (CT)Contrastive Learning	CodeCode Available
The Role of Background Information in Reducing Object Hallucination in Vision-Language Models: Insights from Cutoff API Prompting	Feb 21, 2025	HallucinationObject	—Unverified
From PowerPoint UI Sketches to Web-Based Applications: Pattern-Driven Code Generation for GIS Dashboard Development Using Knowledge-Augmented LLMs, Context-Aware Visual Prompting, and the React Framework	Feb 12, 2025	Code GenerationRAG	—Unverified
Personalization Toolkit: Training Free Personalization of Large Vision Language Models	Feb 4, 2025	RAGRetrieval	—Unverified
Articulate AnyMesh: Open-Vocabulary 3D Articulated Objects Modeling	Feb 4, 2025	ObjectVisual Prompting	—Unverified
IP-Prompter: Training-Free Theme-Specific Image Generation via Dynamic Visual Prompting	Jan 26, 2025	Diffusion PersonalizationDiffusion Personalization Tuning Free	CodeCode Available
MedFocusCLIP : Improving few shot classification in medical datasets using pixel wise attention	Jan 7, 2025	ClassificationFine-Grained Image Classification	—Unverified
Query Efficient Black-Box Visual Prompting with Subspace Learning	Jan 1, 2025	Prompt LearningVisual Prompting	—Unverified
Visual Prompting with Iterative Refinement for Design Critique Generation	Dec 22, 2024	AttributeVisual Prompting	—Unverified
Test-time Correction with Human Feedback: An Online 3D Detection System via Visual Prompting	Dec 10, 2024	Autonomous DrivingVisual Prompting	—Unverified
MLLM-Search: A Zero-Shot Approach to Finding People using Multimodal Large Language Models	Nov 27, 2024	Person SearchVisual Prompting	—Unverified
Prompting the Unseen: Detecting Hidden Backdoors in Black-Box Models	Nov 14, 2024	Visual Prompting	—Unverified
WeatherGFM: Learning A Weather Generalist Foundation Model via In-context Learning	Nov 8, 2024	In-Context LearningQuestion Answering	—Unverified
Benchmarking Human and Automated Prompting in the Segment Anything Model	Oct 29, 2024	BenchmarkingImage Segmentation	CodeCode Available
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms	Oct 24, 2024	DiversityLanguage Modeling	—Unverified
Visual Prompting in LLMs for Enhancing Emotion Recognition	Oct 3, 2024	Emotion RecognitionVisual Prompting	—Unverified
GSON: A Group-based Social Navigation Framework with Large Multimodal Model	Sep 26, 2024	Autonomous VehiclesMotion Planning	—Unverified
Cycle-Consistency Uncertainty Estimation for Visual Prompting based One-Shot Defect Segmentation	Sep 21, 2024	Defect DetectionVisual Prompting	—Unverified
End-to-end Open-vocabulary Video Visual Relationship Detection using Multi-modal Prompting	Sep 19, 2024	DecoderObject	—Unverified
Visual Prompting in Multimodal Large Language Models: A Survey	Sep 5, 2024	In-Context LearningPrompt Learning	—Unverified
When Does Visual Prompting Outperform Linear Probing for Vision-Language Models? A Likelihood Perspective	Sep 3, 2024	Transfer LearningVisual Prompting	CodeCode Available
Rethinking Sparse Lexical Representations for Image Retrieval in the Age of Rising Multi-Modal Large Language Models	Aug 29, 2024	Data AugmentationImage Retrieval	—Unverified
Targeted Visual Prompting for Medical Visual Question Answering	Aug 6, 2024	Medical Visual Question AnsweringQuestion Answering	CodeCode Available
Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model	Aug 1, 2024	EgoSchemaLanguage Modeling	—Unverified
Chat2Layout: Interactive 3D Furniture Layout with a Multimodal LLM	Jul 31, 2024	In-Context LearningLayout Design	—Unverified
Affordance-Guided Reinforcement Learning via Visual Prompting	Jul 14, 2024	reinforcement-learningReinforcement Learning	—Unverified
UICrit: Enhancing Automated Design Evaluation with a UICritique Dataset	Jul 11, 2024	Visual Prompting	CodeCode Available
DegustaBot: Zero-Shot Visual Preference Estimation for Personalized Multi-Object Rearrangement	Jul 11, 2024	Object RearrangementVisual Prompting	—Unverified
Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge	Jul 5, 2024	Instance SegmentationOptical Character Recognition (OCR)	—Unverified
Robust Adaptation of Foundation Models with Black-Box Visual Prompting	Jul 4, 2024	Transfer LearningVisual Prompting	—Unverified
Towards Open-World Grasping with Large Vision-Language Models	Jun 26, 2024	Robotic GraspingVisual Grounding	—Unverified
RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics	Jun 15, 2024	Language ModelingLanguage Modelling	—Unverified
Exploring the Zero-Shot Capabilities of Vision-Language Models for Improving Gaze Following	Jun 6, 2024	In-Context LearningVisual Prompting	—Unverified
Learning Visual Prompts for Guiding the Attention of Vision Transformers	Jun 5, 2024	Visual Prompting	—Unverified

Show:10 25 50

← PrevPage 2 of 3Next →

No leaderboard results yet.