SOTAVerified

Visual Prompting

Visual Prompting is the task of streamlining computer vision processes by harnessing the power of prompts, inspired by the breakthroughs of text prompting in NLP. This innovative approach involves using a few visual prompts to swiftly convert an unlabeled dataset into a deployed model, significantly reducing development time for both individual projects and enterprise solutions.

Papers

Showing 51100 of 127 papers

TitleStatusHype
When Does Visual Prompting Outperform Linear Probing for Vision-Language Models? A Likelihood PerspectiveCode0
Open-Vocabulary Action Localization with Iterative Visual PromptingCode1
Rethinking Sparse Lexical Representations for Image Retrieval in the Age of Rising Multi-Modal Large Language Models0
Targeted Visual Prompting for Medical Visual Question AnsweringCode0
Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model0
Chat2Layout: Interactive 3D Furniture Layout with a Multimodal LLM0
EarthMarker: A Visual Prompting Multi-modal Large Language Model for Remote SensingCode1
By My Eyes: Grounding Multimodal Large Language Models with Sensor Data via Visual PromptingCode1
Affordance-Guided Reinforcement Learning via Visual Prompting0
UICrit: Enhancing Automated Design Evaluation with a UICritique DatasetCode0
DegustaBot: Zero-Shot Visual Preference Estimation for Personalized Multi-Object Rearrangement0
Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge0
Robust Adaptation of Foundation Models with Black-Box Visual Prompting0
Towards Open-World Grasping with Large Vision-Language Models0
Dynamic Domains, Dynamic Solutions: DPCore for Continual Test-Time AdaptationCode1
RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics0
OT-VP: Optimal Transport-guided Visual Prompting for Test-Time AdaptationCode1
Exploring the Zero-Shot Capabilities of Vision-Language Models for Improving Gaze Following0
Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language ModelsCode2
Learning Visual Prompts for Guiding the Attention of Vision Transformers0
Analogist: Out-of-the-box Visual In-Context Learning with Image Diffusion Model0
MoVL:Exploring Fusion Strategies for the Domain-Adaptive Application of Pretrained Models in Medical Imaging Tasks0
Memory-Space Visual Prompting for Efficient Vision-Language Fine-TuningCode2
Open-Set Video-based Facial Expression Recognition with Human Expression-sensitive Prompting0
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language ModelsCode4
BLINK: Multimodal Large Language Models Can See but Not Perceive0
Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale ApproachCode1
Exploring the Transferability of Visual Prompting for Multimodal Large Language ModelsCode1
Finding Visual Task VectorsCode1
Medical Visual Prompting (MVP): A Unified Framework for Versatile and High-Quality Medical Image Segmentation0
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You WantCode2
Explore until Confident: Efficient Exploration for Embodied Question Answering0
On the low-shot transferability of [V]-Mamba0
MOKA: Open-World Robotic Manipulation through Mark-Based Visual Prompting0
Tumor segmentation on whole slide images: training or prompting?0
Scaffolding Coordinates to Promote Vision-Language Coordination in Large Multi-Modal ModelsCode1
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs0
Tune-An-Ellipse: CLIP Has Potential to Find What You WantCode1
Generative Multimodal Models are In-Context LearnersCode3
LaViP:Language-Grounded Visual Prompts0
3DAxiesPrompts: Unleashing the 3D Spatial Task Capabilities of GPT-4V0
Tokenize Anything via PromptingCode2
EZ-CLIP: Efficient Zeroshot Video Action RecognitionCode1
ViscoNet: Bridging and Harmonizing Visual and Textual Conditioning for ControlNetCode1
Visual Prompting Upgrades Neural Network Sparsification: A Data-Model PerspectiveCode1
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual PromptsCode0
T-Rex: Counting by Visual Prompting0
Visual In-Context PromptingCode4
GeoSAM: Fine-tuning SAM with Multi-Modal Prompts for Mobility Infrastructure SegmentationCode1
Towards Robust and Accurate Visual Prompting0
Show:102550
← PrevPage 2 of 3Next →

No leaderboard results yet.