| Synthesizing Sentiment-Controlled Feedback For Multimodal Text and Image Data | Feb 12, 2024 | DecoderMarketing | CodeCode Available | 0 |
| Synthetic Document Question Answering in Hungarian | May 29, 2025 | Optical Character Recognition (OCR)Question Answering | CodeCode Available | 0 |
| Multimodal Explanations: Justifying Decisions and Pointing to the Evidence | Feb 15, 2018 | Activity RecognitionExplainable Models | CodeCode Available | 0 |
| Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding | Jun 6, 2016 | Phrase GroundingVisual Grounding | CodeCode Available | 0 |
| Language Models Meet Anomaly Detection for Better Interpretability and Generalizability | Apr 11, 2024 | Anomaly DetectionLanguage Modelling | CodeCode Available | 0 |
| Open-Ended Visual Question-Answering | Oct 9, 2016 | Question AnsweringSentence | CodeCode Available | 0 |
| ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images | Apr 16, 2024 | Multimodal Deep LearningOptical Character Recognition (OCR) | CodeCode Available | 0 |
| Multi-Image Visual Question Answering | Dec 27, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| MQA: Answering the Question via Robotic Manipulation | Mar 10, 2020 | Imitation LearningQuestion Answering | CodeCode Available | 0 |
| Open-Set Knowledge-Based Visual Question Answering with Inference Paths | Oct 12, 2023 | Knowledge GraphsMulti-class Classification | CodeCode Available | 0 |
| OpenViVQA: Task, Dataset, and Multimodal Fusion Models for Visual Question Answering in Vietnamese | May 7, 2023 | Information RetrievalQuestion Answering | CodeCode Available | 0 |
| T2I-FineEval: Fine-Grained Compositional Metric for Text-to-Image Evaluation | Mar 14, 2025 | AttributeQuestion Answering | CodeCode Available | 0 |
| Examining Gender and Racial Bias in Large Vision-Language Models Using a Novel Dataset of Parallel Images | Feb 8, 2024 | Image CaptioningQuestion Answering | CodeCode Available | 0 |
| Evaluating Fairness in Large Vision-Language Models Across Diverse Demographic Attributes and Prompts | Jun 25, 2024 | FairnessQuestion Answering | CodeCode Available | 0 |
| TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines | Oct 31, 2019 | AttributeQuestion Answering | CodeCode Available | 0 |
| TAB-VCR: Tags and Attributes based VCR Baselines | Dec 1, 2019 | AttributeQuestion Answering | CodeCode Available | 0 |
| TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter | Jun 22, 2023 | Question AnsweringRetrieval | CodeCode Available | 0 |
| OsmLocator: locating overlapping scatter marks with a non-training generative perspective | Dec 18, 2023 | ClusteringCombinatorial Optimization | CodeCode Available | 0 |
| Modulating early visual processing by language | Jul 2, 2017 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Modularized Zero-shot VQA with Pre-trained Models | May 27, 2023 | object-detectionObject Detection | CodeCode Available | 0 |
| What's in a Question: Using Visual Questions as a Form of Supervision | Apr 12, 2017 | Data AugmentationForm | CodeCode Available | 0 |
| Outside Knowledge Conversational Video (OKCV) Dataset -- Dialoguing over Videos | Jun 11, 2025 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| MM-Prompt: Cross-Modal Prompt Tuning for Continual Visual Question Answering | May 26, 2025 | Continual LearningQuestion Answering | CodeCode Available | 0 |
| Evaluating Attribute Comprehension in Large Vision-Language Models | Aug 25, 2024 | AttributeImage-text matching | CodeCode Available | 0 |
| Compressing And Debiasing Vision-Language Pre-Trained Models for Visual Question Answering | Oct 26, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |