| I See Dead People: Gray-Box Adversarial Attack on Image-To-Text Models | Jun 13, 2023 | Adversarial AttackDecoder | —Unverified | 0 | 0 |
| Knowledge Aware Semantic Concept Expansion for Image-Text Matching | Aug 10, 2019 | Common Sense ReasoningContent-Based Image Retrieval | —Unverified | 0 | 0 |
| Knowledge driven Description Synthesis for Floor Plan Interpretation | Mar 15, 2021 | Caption GenerationDescriptive | —Unverified | 0 | 0 |
| Semantically Grounded QFormer for Efficient Vision Language Understanding | Nov 13, 2023 | DiversityImage to text | —Unverified | 0 | 0 |
| Learning by Hallucinating: Vision-Language Pre-training with Weak Supervision | Oct 24, 2022 | cross-modal alignmentCross-Modal Retrieval | —Unverified | 0 | 0 |
| Learning Deep Structure-Preserving Image-Text Embeddings | Nov 19, 2015 | Image RetrievalImage to text | —Unverified | 0 | 0 |
| Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection | Dec 4, 2023 | Image to textobject-detection | —Unverified | 0 | 0 |
| Leveraging AI to Generate Audio for User-generated Content in Video Games | Apr 25, 2024 | Audio GenerationGame Design | —Unverified | 0 | 0 |
| Leveraging Unpaired Data for Vision-Language Generative Models via Cycle Consistency | Oct 5, 2023 | Image GenerationImage to text | —Unverified | 0 | 0 |
| MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering | Dec 19, 2022 | Chart Question AnsweringData Summarization | —Unverified | 0 | 0 |