| Turbo Learning for Captionbot and Drawingbot | May 21, 2018 | Image CaptioningImage Generation | —Unverified | 0 | 0 |
| Two-stream Hierarchical Similarity Reasoning for Image-text Matching | Mar 10, 2022 | Image-text matchingImage to text | —Unverified | 0 | 0 |
| Uncertainty-based Cross-Modal Retrieval with Probabilistic Representations | Apr 20, 2022 | Cross-Modal RetrievalImage Retrieval | —Unverified | 0 | 0 |
| Understanding the Effect of using Semantically Meaningful Tokens for Visual Representation Learning | May 26, 2024 | Image to textImage-to-Text Retrieval | —Unverified | 0 | 0 |
| UNITE-FND: Reframing Multimodal Fake News Detection through Unimodal Scene Translation | Feb 16, 2025 | Binary ClassificationFake News Detection | —Unverified | 0 | 0 |
| Using Inter-Sentence Diverse Beam Search to Reduce Redundancy in Visual Storytelling | May 30, 2018 | Image to textSentence | —Unverified | 0 | 0 |
| Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages | Nov 24, 2021 | DecoderImage to text | —Unverified | 0 | 0 |
| Vision-Braille: An End-to-End Tool for Chinese Braille Image-to-Text Translation | Jul 8, 2024 | Image to textLifelong learning | —Unverified | 0 | 0 |
| Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation | Apr 30, 2024 | Caption GenerationHallucination | —Unverified | 0 | 0 |
| When are Lemons Purple? The Concept Association Bias of Vision-Language Models | Dec 22, 2022 | Attributeimage-classification | —Unverified | 0 | 0 |