SOTAVerified

Visual Prompting

Visual Prompting is the task of streamlining computer vision processes by harnessing the power of prompts, inspired by the breakthroughs of text prompting in NLP. This innovative approach involves using a few visual prompts to swiftly convert an unlabeled dataset into a deployed model, significantly reducing development time for both individual projects and enterprise solutions.

Papers

Showing 51100 of 127 papers

TitleStatusHype
Leveraging Large Language Models for Scalable Vector Graphics-Driven Image UnderstandingCode0
Exploring the Benefits of Visual Prompting in Differential PrivacyCode0
UICrit: Enhancing Automated Design Evaluation with a UICritique DatasetCode0
Stepwise Decomposition and Dual-stream Focus: A Novel Approach for Training-free Camouflaged Object SegmentationCode0
When Does Visual Prompting Outperform Linear Probing for Vision-Language Models? A Likelihood PerspectiveCode0
Unleashing the Power of Visual Prompting At the Pixel LevelCode0
Adapting Pre-trained Language Models to Vision-Language Tasks via Dynamic Visual PromptingCode0
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual PromptsCode0
Uncovering the Hidden Cost of Model CompressionCode0
Targeted Visual Prompting for Medical Visual Question AnsweringCode0
Towards Online Multi-Modal Social Interaction UnderstandingCode0
Benchmarking Human and Automated Prompting in the Segment Anything ModelCode0
VP-NTK: Exploring the Benefits of Visual Prompting in Differentially Private Data Synthesis0
WeatherGFM: Learning A Weather Generalist Foundation Model via In-context Learning0
Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model0
Zoomer: Adaptive Image Focus Optimization for Black-box MLLM0
3DAxiesPrompts: Unleashing the 3D Spatial Task Capabilities of GPT-4V0
3DAxisPrompt: Promoting the 3D Grounding and Reasoning in GPT-4o0
A Comprehensive Evaluation of Multi-Modal Large Language Models for Endoscopy Analysis0
Affordance-Guided Reinforcement Learning via Visual Prompting0
Analogist: Out-of-the-box Visual In-Context Learning with Image Diffusion Model0
Articulate AnyMesh: Open-Vocabulary 3D Articulated Objects Modeling0
Black-Box Visual Prompt Engineering for Mitigating Object Hallucination in Large Vision Language Models0
BLINK: Multimodal Large Language Models Can See but Not Perceive0
Chat2Layout: Interactive 3D Furniture Layout with a Multimodal LLM0
Cycle-Consistency Uncertainty Estimation for Visual Prompting based One-Shot Defect Segmentation0
DegustaBot: Zero-Shot Visual Preference Estimation for Personalized Multi-Object Rearrangement0
DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models0
EarthGPT-X: Enabling MLLMs to Flexibly and Comprehensively Understand Multi-Source Remote Sensing Imagery0
End-to-end Open-vocabulary Video Visual Relationship Detection using Multi-modal Prompting0
Explore until Confident: Efficient Exploration for Embodied Question Answering0
Exploring the Zero-Shot Capabilities of Vision-Language Models for Improving Gaze Following0
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms0
From PowerPoint UI Sketches to Web-Based Applications: Pattern-Driven Code Generation for GIS Dashboard Development Using Knowledge-Augmented LLMs, Context-Aware Visual Prompting, and the React Framework0
FS-DETR: Few-Shot DEtection TRansformer with prompting and without re-training0
FVP: Fourier Visual Prompting for Source-Free Unsupervised Domain Adaptation of Medical Image Segmentation0
Grid-LOGAT: Grid Based Local and Global Area Transcription for Video Question Answering0
GSON: A Group-based Social Navigation Framework with Large Multimodal Model0
ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation0
Is Temporal Prompting All We Need For Limited Labeled Action Recognition?0
KUDA: Keypoints to Unify Dynamics Learning and Visual Prompting for Open-Vocabulary Robotic Manipulation0
LaViP:Language-Grounded Visual Prompts0
Learning Expressive Prompting With Residuals for Vision Transformers0
Learning Visual Prompts for Guiding the Attention of Vision Transformers0
MedFocusCLIP : Improving few shot classification in medical datasets using pixel wise attention0
Medical Visual Prompting (MVP): A Unified Framework for Versatile and High-Quality Medical Image Segmentation0
MLLM-Search: A Zero-Shot Approach to Finding People using Multimodal Large Language Models0
MOKA: Open-World Robotic Manipulation through Mark-Based Visual Prompting0
MoVL:Exploring Fusion Strategies for the Domain-Adaptive Application of Pretrained Models in Medical Imaging Tasks0
NVSMask3D: Hard Visual Prompting with Camera Pose Interpolation for 3D Open Vocabulary Instance Segmentation0
Show:102550
← PrevPage 2 of 3Next →

No leaderboard results yet.