Segment Anything Apr 5, 2023 Event-based Object Segmentation Image Segmentation
Code Code Available 5GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models Jan 2, 2025 Scene Understanding text annotation
Code Code Available 4Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models Apr 19, 2024 Language Modeling Language Modelling
Code Code Available 4Visual In-Context Prompting Nov 22, 2023 Decoder Segmentation
Code Code Available 4Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V Oct 17, 2023 Interactive Segmentation Referring Expression
Code Code Available 4Generative Multimodal Models are In-Context Learners Dec 20, 2023 In-Context Learning Personalized Image Generation
Code Code Available 3Chameleon: Fast-slow Neuro-symbolic Lane Topology Extraction Mar 10, 2025 Autonomous Driving Scene Understanding
Code Code Available 2Attention Prompting on Image for Large Vision-Language Models Sep 25, 2024 MM-Vet Visual Prompting
Code Code Available 2Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models Jun 5, 2024 Few-Shot Learning Language Modeling
Code Code Available 2Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning May 9, 2024 parameter-efficient fine-tuning Visual Prompting
Code Code Available 2Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want Mar 29, 2024 Instruction Following Language Modelling
Code Code Available 2Tokenize Anything via Prompting Dec 14, 2023 Decoder Visual Prompting
Code Code Available 2Explicit Visual Prompting for Universal Foreground Segmentations May 29, 2023 Camouflaged Object Segmentation Defocus Blur Detection
Code Code Available 2Explicit Visual Prompting for Low-Level Structure Segmentations Mar 20, 2023 Camouflaged Object Segmentation Defocus Blur Detection
Code Code Available 2Visual Prompting via Image Inpainting Sep 1, 2022 Colorization Edge Detection
Code Code Available 2Exploring Visual Prompts for Adapting Large-Scale Models Mar 31, 2022 Visual Prompting
Code Code Available 2Vision Graph Prompting via Semantic Low-Rank Decomposition May 7, 2025 parameter-efficient fine-tuning Visual Prompting
Code Code Available 1Token Coordinated Prompt Attention is Needed for Visual Prompting May 5, 2025 Diversity Visual Prompting
Code Code Available 1LoR-VP: Low-Rank Visual Prompting for Efficient Vision Model Adaptation Feb 2, 2025 Inductive Bias Visual Prompting
Code Code Available 1Selective Visual Prompting in Vision Mamba Dec 12, 2024 Mamba State Space Models
Code Code Available 1Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning Dec 4, 2024 Multimodal Large Language Model Video Understanding
Code Code Available 1Improved GUI Grounding via Iterative Narrowing Nov 18, 2024 Language Modeling Language Modelling
Code Code Available 1Improving Visual Object Tracking through Visual Prompting Sep 27, 2024 Object
Code Code Available 1Open-Vocabulary Action Localization with Iterative Visual Prompting Aug 30, 2024 Action Localization Temporal Action Localization
Code Code Available 1EarthMarker: A Visual Prompting Multi-modal Large Language Model for Remote Sensing Jul 18, 2024 Instruction Following Language Modeling
Code Code Available 1By My Eyes: Grounding Multimodal Large Language Models with Sensor Data via Visual Prompting Jul 15, 2024 Visual Prompting
Code Code Available 1Dynamic Domains, Dynamic Solutions: DPCore for Continual Test-Time Adaptation Jun 15, 2024 Test-time Adaptation Visual Prompting
Code Code Available 1OT-VP: Optimal Transport-guided Visual Prompting for Test-Time Adaptation Jun 12, 2024 Prompt Learning Test-time Adaptation
Code Code Available 1Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach Apr 17, 2024 Decoder Generalized Few-Shot Semantic Segmentation
Code Code Available 1Exploring the Transferability of Visual Prompting for Multimodal Large Language Models Apr 17, 2024 Hallucination Multimodal Reasoning
Code Code Available 1Finding Visual Task Vectors Apr 8, 2024 Visual Prompting
Code Code Available 1Scaffolding Coordinates to Promote Vision-Language Coordination in Large Multi-Modal Models Feb 19, 2024 Visual Prompting
Code Code Available 1Tune-An-Ellipse: CLIP Has Potential to Find What You Want Jan 1, 2024 Object Referring Expression
Code Code Available 1EZ-CLIP: Efficient Zeroshot Video Action Recognition Dec 13, 2023 Action Recognition GPU
Code Code Available 1ViscoNet: Bridging and Harmonizing Visual and Textual Conditioning for ControlNet Dec 5, 2023 Image Generation Person Re-Identification
Code Code Available 1Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective Dec 3, 2023 Image Classification Visual Prompting
Code Code Available 1GeoSAM: Fine-tuning SAM with Multi-Modal Prompts for Mobility Infrastructure Segmentation Nov 19, 2023 Image Segmentation Large Language Model
Code Code Available 1AutoVP: An Automated Visual Prompting Framework and Benchmark Oct 12, 2023 image-classification Image Classification
Code Code Available 1Visual Instruction Inversion: Image Editing via Visual Prompting Jul 26, 2023 Visual Prompting
Code Code Available 1Fine-Grained Visual Prompting Jun 7, 2023 Visual Prompting
Code Code Available 1UPGPT: Universal Diffusion Model for Person Image Generation, Editing and Pose Transfer Apr 18, 2023 Disentanglement Image Generation
Code Code Available 1BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning Mar 26, 2023 Transfer Learning Visual Prompting
Code Code Available 1Diversity-Aware Meta Visual Prompting Mar 14, 2023 Diversity Visual Prompting
Code Code Available 1Text-Visual Prompting for Efficient 2D Temporal Video Grounding Mar 9, 2023 Sentence Video Grounding
Code Code Available 1Understanding and Improving Visual Prompting: A Label-Mapping Perspective Nov 21, 2022 Transfer Learning Visual Prompting
Code Code Available 1Visual Prompting for Adversarial Robustness Oct 12, 2022 Adversarial Defense Adversarial Robustness
Code Code Available 1Stepwise Decomposition and Dual-stream Focus: A Novel Approach for Training-free Camouflaged Object Segmentation Jun 7, 2025 Camouflaged Object Segmentation Feature Correlation
Code Code Available 0RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought Jun 4, 2025 Multimodal Reasoning Reasoning Segmentation
— Unverified 0Grid-LOGAT: Grid Based Local and Global Area Transcription for Video Question Answering May 30, 2025 Language Modeling Language Modelling
— Unverified 0DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models May 29, 2025 Visual Prompting
— Unverified 0