SOTAVerified

Visual Prompting

Visual Prompting is the task of streamlining computer vision processes by harnessing the power of prompts, inspired by the breakthroughs of text prompting in NLP. This innovative approach involves using a few visual prompts to swiftly convert an unlabeled dataset into a deployed model, significantly reducing development time for both individual projects and enterprise solutions.

Papers

Showing 150 of 127 papers

TitleStatusHype
Segment AnythingCode5
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language ModelsCode4
Visual In-Context PromptingCode4
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language ModelsCode4
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4VCode4
Generative Multimodal Models are In-Context LearnersCode3
Visual Prompting via Image InpaintingCode2
Explicit Visual Prompting for Low-Level Structure SegmentationsCode2
Chameleon: Fast-slow Neuro-symbolic Lane Topology ExtractionCode2
Explicit Visual Prompting for Universal Foreground SegmentationsCode2
Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language ModelsCode2
Exploring Visual Prompts for Adapting Large-Scale ModelsCode2
Tokenize Anything via PromptingCode2
Memory-Space Visual Prompting for Efficient Vision-Language Fine-TuningCode2
Attention Prompting on Image for Large Vision-Language ModelsCode2
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You WantCode2
Improved GUI Grounding via Iterative NarrowingCode1
BlackVIP: Black-Box Visual Prompting for Robust Transfer LearningCode1
Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale ApproachCode1
Visual Instruction Inversion: Image Editing via Visual PromptingCode1
Visual Prompting for Adversarial RobustnessCode1
Visual Prompting Upgrades Neural Network Sparsification: A Data-Model PerspectiveCode1
Understanding and Improving Visual Prompting: A Label-Mapping PerspectiveCode1
Dynamic Domains, Dynamic Solutions: DPCore for Continual Test-Time AdaptationCode1
AutoVP: An Automated Visual Prompting Framework and BenchmarkCode1
EarthMarker: A Visual Prompting Multi-modal Large Language Model for Remote SensingCode1
UPGPT: Universal Diffusion Model for Person Image Generation, Editing and Pose TransferCode1
Token Coordinated Prompt Attention is Needed for Visual PromptingCode1
Text-Visual Prompting for Efficient 2D Temporal Video GroundingCode1
Tune-An-Ellipse: CLIP Has Potential to Find What You WantCode1
ViscoNet: Bridging and Harmonizing Visual and Textual Conditioning for ControlNetCode1
Scaffolding Coordinates to Promote Vision-Language Coordination in Large Multi-Modal ModelsCode1
Improving Visual Object Tracking through Visual PromptingCode1
OT-VP: Optimal Transport-guided Visual Prompting for Test-Time AdaptationCode1
EZ-CLIP: Efficient Zeroshot Video Action RecognitionCode1
By My Eyes: Grounding Multimodal Large Language Models with Sensor Data via Visual PromptingCode1
Finding Visual Task VectorsCode1
Fine-Grained Visual PromptingCode1
Exploring the Transferability of Visual Prompting for Multimodal Large Language ModelsCode1
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction TuningCode1
LoR-VP: Low-Rank Visual Prompting for Efficient Vision Model AdaptationCode1
Open-Vocabulary Action Localization with Iterative Visual PromptingCode1
GeoSAM: Fine-tuning SAM with Multi-Modal Prompts for Mobility Infrastructure SegmentationCode1
Selective Visual Prompting in Vision MambaCode1
Diversity-Aware Meta Visual PromptingCode1
Vision Graph Prompting via Semantic Low-Rank DecompositionCode1
Explore until Confident: Efficient Exploration for Embodied Question Answering0
BLINK: Multimodal Large Language Models Can See but Not Perceive0
End-to-end Open-vocabulary Video Visual Relationship Detection using Multi-modal Prompting0
Affordance-Guided Reinforcement Learning via Visual Prompting0
Show:102550
← PrevPage 1 of 3Next →

No leaderboard results yet.