Segment Anything Apr 5, 2023 Event-based Object Segmentation Image Segmentation
Code Code Available 55 Visual In-Context Prompting Nov 22, 2023 Decoder Segmentation
Code Code Available 45 GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models Jan 2, 2025 Scene Understanding text annotation
Code Code Available 45 Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models Apr 19, 2024 Language Modeling Language Modelling
Code Code Available 45 Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V Oct 17, 2023 Interactive Segmentation Referring Expression
Code Code Available 45 Generative Multimodal Models are In-Context Learners Dec 20, 2023 In-Context Learning Personalized Image Generation
Code Code Available 35 Visual Prompting via Image Inpainting Sep 1, 2022 Colorization Edge Detection
Code Code Available 25 Explicit Visual Prompting for Universal Foreground Segmentations May 29, 2023 Camouflaged Object Segmentation Defocus Blur Detection
Code Code Available 25 Explicit Visual Prompting for Low-Level Structure Segmentations Mar 20, 2023 Camouflaged Object Segmentation Defocus Blur Detection
Code Code Available 25 Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models Jun 5, 2024 Few-Shot Learning Language Modeling
Code Code Available 25 Attention Prompting on Image for Large Vision-Language Models Sep 25, 2024 MM-Vet Visual Prompting
Code Code Available 25 Tokenize Anything via Prompting Dec 14, 2023 Decoder Visual Prompting
Code Code Available 25 Chameleon: Fast-slow Neuro-symbolic Lane Topology Extraction Mar 10, 2025 Autonomous Driving Scene Understanding
Code Code Available 25 Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning May 9, 2024 parameter-efficient fine-tuning Visual Prompting
Code Code Available 25 Exploring Visual Prompts for Adapting Large-Scale Models Mar 31, 2022 Visual Prompting
Code Code Available 25 Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want Mar 29, 2024 Instruction Following Language Modelling
Code Code Available 25 Improved GUI Grounding via Iterative Narrowing Nov 18, 2024 Language Modeling Language Modelling
Code Code Available 15 BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning Mar 26, 2023 Transfer Learning Visual Prompting
Code Code Available 15 Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective Dec 3, 2023 Image Classification Visual Prompting
Code Code Available 15 Text-Visual Prompting for Efficient 2D Temporal Video Grounding Mar 9, 2023 Sentence Video Grounding
Code Code Available 15 Token Coordinated Prompt Attention is Needed for Visual Prompting May 5, 2025 Diversity Visual Prompting
Code Code Available 15 Understanding and Improving Visual Prompting: A Label-Mapping Perspective Nov 21, 2022 Transfer Learning Visual Prompting
Code Code Available 15 Diversity-Aware Meta Visual Prompting Mar 14, 2023 Diversity Visual Prompting
Code Code Available 15 Dynamic Domains, Dynamic Solutions: DPCore for Continual Test-Time Adaptation Jun 15, 2024 Test-time Adaptation Visual Prompting
Code Code Available 15 AutoVP: An Automated Visual Prompting Framework and Benchmark Oct 12, 2023 image-classification Image Classification
Code Code Available 15 EarthMarker: A Visual Prompting Multi-modal Large Language Model for Remote Sensing Jul 18, 2024 Instruction Following Language Modeling
Code Code Available 15 Selective Visual Prompting in Vision Mamba Dec 12, 2024 Mamba State Space Models
Code Code Available 15 Scaffolding Coordinates to Promote Vision-Language Coordination in Large Multi-Modal Models Feb 19, 2024 Visual Prompting
Code Code Available 15 Open-Vocabulary Action Localization with Iterative Visual Prompting Aug 30, 2024 Action Localization Temporal Action Localization
Code Code Available 15 Tune-An-Ellipse: CLIP Has Potential to Find What You Want Jan 1, 2024 Object Referring Expression
Code Code Available 15 Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach Apr 17, 2024 Decoder Generalized Few-Shot Semantic Segmentation
Code Code Available 15 Finding Visual Task Vectors Apr 8, 2024 Visual Prompting
Code Code Available 15 Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning Dec 4, 2024 Multimodal Large Language Model Video Understanding
Code Code Available 15 LoR-VP: Low-Rank Visual Prompting for Efficient Vision Model Adaptation Feb 2, 2025 Inductive Bias Visual Prompting
Code Code Available 15 Visual Instruction Inversion: Image Editing via Visual Prompting Jul 26, 2023 Visual Prompting
Code Code Available 15 EZ-CLIP: Efficient Zeroshot Video Action Recognition Dec 13, 2023 Action Recognition GPU
Code Code Available 15 By My Eyes: Grounding Multimodal Large Language Models with Sensor Data via Visual Prompting Jul 15, 2024 Visual Prompting
Code Code Available 15 Fine-Grained Visual Prompting Jun 7, 2023 Visual Prompting
Code Code Available 15 ViscoNet: Bridging and Harmonizing Visual and Textual Conditioning for ControlNet Dec 5, 2023 Image Generation Person Re-Identification
Code Code Available 15 Exploring the Transferability of Visual Prompting for Multimodal Large Language Models Apr 17, 2024 Hallucination Multimodal Reasoning
Code Code Available 15 OT-VP: Optimal Transport-guided Visual Prompting for Test-Time Adaptation Jun 12, 2024 Prompt Learning Test-time Adaptation
Code Code Available 15 Improving Visual Object Tracking through Visual Prompting Sep 27, 2024 Object
Code Code Available 15 GeoSAM: Fine-tuning SAM with Multi-Modal Prompts for Mobility Infrastructure Segmentation Nov 19, 2023 Image Segmentation Large Language Model
Code Code Available 15 UPGPT: Universal Diffusion Model for Person Image Generation, Editing and Pose Transfer Apr 18, 2023 Disentanglement Image Generation
Code Code Available 15 Vision Graph Prompting via Semantic Low-Rank Decomposition May 7, 2025 parameter-efficient fine-tuning Visual Prompting
Code Code Available 15 Visual Prompting for Adversarial Robustness Oct 12, 2022 Adversarial Defense Adversarial Robustness
Code Code Available 15 Towards Universal Text-driven CT Image Segmentation Mar 8, 2025 Computed Tomography (CT) Contrastive Learning
Code Code Available 05 Towards Online Multi-Modal Social Interaction Understanding Mar 25, 2025 Visual Prompting
Code Code Available 05 UICrit: Enhancing Automated Design Evaluation with a UICritique Dataset Jul 11, 2024 Visual Prompting
Code Code Available 05 Adapting Pre-trained Language Models to Vision-Language Tasks via Dynamic Visual Prompting Jun 1, 2023 Transfer Learning Visual Prompting
Code Code Available 05