| TrojVLM: Backdoor Attack Against Vision Language Models | Sep 28, 2024 | Backdoor AttackImage Captioning | —Unverified | 0 |
| Robotic Environmental State Recognition with Pre-Trained Vision-Language Models and Black-Box Optimization | Sep 26, 2024 | Image to textImage-to-Text Retrieval | —Unverified | 0 |
| Evaluating authenticity and quality of image captions via sentiment and semantic analyses | Sep 14, 2024 | Image CaptioningImage to text | —Unverified | 0 |
| See or Guess: Counterfactually Regularized Image Captioning | Aug 29, 2024 | Causal Inferencecounterfactual | CodeCode Available | 1 |
| UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation | Aug 21, 2024 | Image GenerationImage Retrieval | CodeCode Available | 1 |
| Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models | Aug 16, 2024 | Image to text | —Unverified | 0 |
| In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation | Aug 9, 2024 | Image to textObject | CodeCode Available | 2 |
| Instruction Tuning-free Visual Token Complement for Multimodal LLMs | Aug 9, 2024 | Image GenerationImage to text | —Unverified | 0 |
| GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language Models | Jul 30, 2024 | Image to textImage-to-Text Retrieval | CodeCode Available | 0 |
| Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities | Jul 29, 2024 | Contrastive LearningDeepFake Detection | CodeCode Available | 2 |