| UFO: A UniFied TransfOrmer for Vision-Language Representation Learning | Nov 19, 2021 | Image CaptioningImage-text matching | —Unverified | 0 |
| UIT-Saviors at MEDVQA-GI 2023: Improving Multimodal Learning with Image Enhancement for Gastrointestinal Visual Question Answering | Jul 6, 2023 | DiagnosticImage Enhancement | —Unverified | 0 |
| Unanswerable Questions about Images and Texts | Jan 25, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Uncertainty-based Visual Question Answering: Estimating Semantic Inconsistency between Image and Knowledge Base | Nov 16, 2021 | Question AnsweringSemantic Similarity | —Unverified | 0 |
| Uncertainty-based Visual Question Answering: Estimating Semantic Inconsistency between Image and Knowledge Base | Jul 27, 2022 | Question AnsweringSemantic Similarity | —Unverified | 0 |
| Uncovering Bias in Large Vision-Language Models with Counterfactuals | Mar 29, 2024 | counterfactualQuestion Answering | —Unverified | 0 |
| Uncovering Bias in Large Vision-Language Models at Scale with Counterfactuals | May 30, 2024 | counterfactualQuestion Answering | —Unverified | 0 |
| Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning | Mar 10, 2023 | Few-Shot Image Classificationimage-classification | —Unverified | 0 |
| Understanding Attention for Vision-and-Language Tasks | Dec 17, 2021 | Image GenerationImage Retrieval | —Unverified | 0 |
| Understanding Complexity in VideoQA via Visual Program Generation | May 19, 2025 | Code GenerationQuestion Answering | —Unverified | 0 |
| Understanding in Artificial Intelligence | Jan 17, 2021 | Natural Language UnderstandingQuestion Answering | —Unverified | 0 |
| Understanding Information Storage and Transfer in Multi-modal Large Language Models | Jun 6, 2024 | Factual Visual Question AnsweringModel Editing | —Unverified | 0 |
| Understanding Knowledge Gaps in Visual Question Answering: Implications for Gap Identification and Testing | Apr 8, 2020 | DiversityQuestion Answering | —Unverified | 0 |
| Understanding the Role of Scene Graphs in Visual Question Answering | Jan 14, 2021 | Graph GenerationQuestion Answering | —Unverified | 0 |
| UnICLAM:Contrastive Representation Learning with Adversarial Masking for Unified and Interpretable Medical Vision Question Answering | Dec 21, 2022 | Data AugmentationDecision Making | —Unverified | 0 |
| Bidirectional Contrastive Split Learning for Visual Question Answering | Aug 24, 2022 | Adversarial AttackBackdoor Attack | —Unverified | 0 |
| Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training | Jan 11, 2022 | DecoderImage Captioning | —Unverified | 0 |
| Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation | Dec 10, 2021 | Image-text matchingImage-text Retrieval | —Unverified | 0 |
| Uni-Mlip: Unified Self-supervision for Medical Vision Language Pre-training | Nov 20, 2024 | Contrastive Learningimage-classification | —Unverified | 0 |
| UNITER: Learning UNiversal Image-TExt Representations | Sep 25, 2019 | Image-text matchingImage-text Retrieval | —Unverified | 0 |
| Un jeu de données pour répondre à des questions visuelles à propos d’entités nommées en utilisant des bases de connaissances (ViQuAE, a Dataset for Knowledge-based Visual Question Answering about Named Entities) | Jun 1, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Unleashing the Potential of Large Language Model: Zero-shot VQA for Flood Disaster Scenario | Dec 4, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Unshuffling Data for Improved Generalization | Feb 27, 2020 | ClusteringData Augmentation | —Unverified | 0 |
| Unshuffling Data for Improved Generalization in Visual Question Answering | Jan 1, 2021 | Out-of-Distribution GeneralizationQuestion Answering | —Unverified | 0 |
| Unsupervised Keyword Extraction for Full-sentence VQA | Nov 23, 2019 | Keyword ExtractionQuestion Answering | —Unverified | 0 |