When Does Visual Prompting Outperform Linear Probing for Vision-Language Models? A Likelihood Perspective Sep 3, 2024 Transfer Learning Visual Prompting
Code Code Available 0Open-Vocabulary Action Localization with Iterative Visual Prompting Aug 30, 2024 Action Localization Temporal Action Localization
Code Code Available 1Rethinking Sparse Lexical Representations for Image Retrieval in the Age of Rising Multi-Modal Large Language Models Aug 29, 2024 Data Augmentation Image Retrieval
— Unverified 0Targeted Visual Prompting for Medical Visual Question Answering Aug 6, 2024 Medical Visual Question Answering Question Answering
Code Code Available 0Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model Aug 1, 2024 EgoSchema Language Modeling
— Unverified 0Chat2Layout: Interactive 3D Furniture Layout with a Multimodal LLM Jul 31, 2024 In-Context Learning Layout Design
— Unverified 0EarthMarker: A Visual Prompting Multi-modal Large Language Model for Remote Sensing Jul 18, 2024 Instruction Following Language Modeling
Code Code Available 1By My Eyes: Grounding Multimodal Large Language Models with Sensor Data via Visual Prompting Jul 15, 2024 Visual Prompting
Code Code Available 1Affordance-Guided Reinforcement Learning via Visual Prompting Jul 14, 2024 reinforcement-learning Reinforcement Learning
— Unverified 0UICrit: Enhancing Automated Design Evaluation with a UICritique Dataset Jul 11, 2024 Visual Prompting
Code Code Available 0DegustaBot: Zero-Shot Visual Preference Estimation for Personalized Multi-Object Rearrangement Jul 11, 2024 Object Rearrangement Visual Prompting
— Unverified 0Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge Jul 5, 2024 Instance Segmentation Optical Character Recognition (OCR)
— Unverified 0Robust Adaptation of Foundation Models with Black-Box Visual Prompting Jul 4, 2024 Transfer Learning Visual Prompting
— Unverified 0Towards Open-World Grasping with Large Vision-Language Models Jun 26, 2024 Robotic Grasping Visual Grounding
— Unverified 0Dynamic Domains, Dynamic Solutions: DPCore for Continual Test-Time Adaptation Jun 15, 2024 Test-time Adaptation Visual Prompting
Code Code Available 1RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics Jun 15, 2024 Language Modeling Language Modelling
— Unverified 0OT-VP: Optimal Transport-guided Visual Prompting for Test-Time Adaptation Jun 12, 2024 Prompt Learning Test-time Adaptation
Code Code Available 1Exploring the Zero-Shot Capabilities of Vision-Language Models for Improving Gaze Following Jun 6, 2024 In-Context Learning Visual Prompting
— Unverified 0Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models Jun 5, 2024 Few-Shot Learning Language Modeling
Code Code Available 2Learning Visual Prompts for Guiding the Attention of Vision Transformers Jun 5, 2024 Visual Prompting
— Unverified 0Analogist: Out-of-the-box Visual In-Context Learning with Image Diffusion Model May 16, 2024 Image Inpainting In-Context Learning
— Unverified 0MoVL:Exploring Fusion Strategies for the Domain-Adaptive Application of Pretrained Models in Medical Imaging Tasks May 13, 2024 image-classification Image Classification
— Unverified 0Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning May 9, 2024 parameter-efficient fine-tuning Visual Prompting
Code Code Available 2Open-Set Video-based Facial Expression Recognition with Human Expression-sensitive Prompting Apr 26, 2024 Facial Expression Recognition Multi-Task Learning
— Unverified 0Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models Apr 19, 2024 Language Modeling Language Modelling
Code Code Available 4BLINK: Multimodal Large Language Models Can See but Not Perceive Apr 18, 2024 Depth Estimation Multiple-choice
— Unverified 0Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach Apr 17, 2024 Decoder Generalized Few-Shot Semantic Segmentation
Code Code Available 1Exploring the Transferability of Visual Prompting for Multimodal Large Language Models Apr 17, 2024 Hallucination Multimodal Reasoning
Code Code Available 1Finding Visual Task Vectors Apr 8, 2024 Visual Prompting
Code Code Available 1Medical Visual Prompting (MVP): A Unified Framework for Versatile and High-Quality Medical Image Segmentation Apr 1, 2024 Image Segmentation Medical Image Segmentation
— Unverified 0Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want Mar 29, 2024 Instruction Following Language Modelling
Code Code Available 2Explore until Confident: Efficient Exploration for Embodied Question Answering Mar 23, 2024 Conformal Prediction Efficient Exploration
— Unverified 0On the low-shot transferability of [V]-Mamba Mar 15, 2024 Few-Shot Learning Mamba
— Unverified 0MOKA: Open-World Robotic Manipulation through Mark-Based Visual Prompting Mar 5, 2024 In-Context Learning Object Rearrangement
— Unverified 0Tumor segmentation on whole slide images: training or prompting? Feb 21, 2024 Computational Efficiency Segmentation
— Unverified 0Scaffolding Coordinates to Promote Vision-Language Coordination in Large Multi-Modal Models Feb 19, 2024 Visual Prompting
Code Code Available 1PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs Feb 12, 2024 Instruction Following Logical Reasoning
— Unverified 0Tune-An-Ellipse: CLIP Has Potential to Find What You Want Jan 1, 2024 Object Referring Expression
Code Code Available 1Generative Multimodal Models are In-Context Learners Dec 20, 2023 In-Context Learning Personalized Image Generation
Code Code Available 3LaViP:Language-Grounded Visual Prompts Dec 18, 2023 Few-Shot Learning Transfer Learning
— Unverified 03DAxiesPrompts: Unleashing the 3D Spatial Task Capabilities of GPT-4V Dec 15, 2023 3D Object Detection object-detection
— Unverified 0Tokenize Anything via Prompting Dec 14, 2023 Decoder Visual Prompting
Code Code Available 2EZ-CLIP: Efficient Zeroshot Video Action Recognition Dec 13, 2023 Action Recognition GPU
Code Code Available 1ViscoNet: Bridging and Harmonizing Visual and Textual Conditioning for ControlNet Dec 5, 2023 Image Generation Person Re-Identification
Code Code Available 1Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective Dec 3, 2023 Image Classification Visual Prompting
Code Code Available 1ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts Dec 1, 2023 Visual Commonsense Reasoning Visual Prompting
Code Code Available 0T-Rex: Counting by Visual Prompting Nov 22, 2023 Object Object Counting
— Unverified 0Visual In-Context Prompting Nov 22, 2023 Decoder Segmentation
Code Code Available 4GeoSAM: Fine-tuning SAM with Multi-Modal Prompts for Mobility Infrastructure Segmentation Nov 19, 2023 Image Segmentation Large Language Model
Code Code Available 1Towards Robust and Accurate Visual Prompting Nov 18, 2023 Adversarial Robustness Transfer Learning
— Unverified 0