| i-Code Studio: A Configurable and Composable Framework for Integrative AI | May 23, 2023 | Question AnsweringRetrieval | —Unverified | 0 |
| DUBLIN -- Document Understanding By Language-Image Network | May 23, 2023 | Document Classificationdocument understanding | —Unverified | 0 |
| Image Manipulation via Multi-Hop Instructions -- A New Dataset and Weakly-Supervised Neuro-Symbolic Approach | May 23, 2023 | Image ManipulationQuestion Answering | —Unverified | 0 |
| Target-Aware Spatio-Temporal Reasoning via Answering Questions in Dynamics Audio-Visual Scenarios | May 21, 2023 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | CodeCode Available | 0 |
| Visual Question Answering: A Survey on Techniques and Common Trends in Recent Literature | May 18, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| IMAD: IMage-Augmented multi-modal Dialogue | May 17, 2023 | Dialogue GenerationQuestion Answering | CodeCode Available | 0 |
| An Empirical Study on the Language Modal in Visual Question Answering | May 17, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Probing the Role of Positional Information in Vision-Language Models | May 17, 2023 | Contrastive LearningImage-text matching | —Unverified | 0 |
| Semantic Composition in Visually Grounded Language Models | May 15, 2023 | Image CaptioningInductive Bias | —Unverified | 0 |
| OpenViVQA: Task, Dataset, and Multimodal Fusion Models for Visual Question Answering in Vietnamese | May 7, 2023 | Information RetrievalQuestion Answering | CodeCode Available | 0 |
| Adaptive loose optimization for robust question answering | May 6, 2023 | Extractive Question-AnsweringMachine Reading Comprehension | CodeCode Available | 0 |
| Analysis of Visual Question Answering Algorithms with attention model | May 4, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Making the Most of What You Have: Adapting Pre-trained Visual Language Models in the Low-data Regime | May 3, 2023 | Image CaptioningQuestion Answering | —Unverified | 0 |
| CHIC: Corporate Document for Visual question Answering | May 1, 2023 | Information RetrievalQuestion Answering | —Unverified | 0 |
| Chain of Thought Prompt Tuning in Vision Language Models | Apr 16, 2023 | Domain Generalizationimage-classification | —Unverified | 0 |
| PDFVQA: A New Dataset for Real-World VQA on PDF Documents | Apr 13, 2023 | document understandingKey Information Extraction | —Unverified | 0 |
| Advancing Medical Imaging with Language Models: A Journey from N-grams to ChatGPT | Apr 11, 2023 | DiagnosticImage Captioning | —Unverified | 0 |
| Boosting Cross-task Transferability of Adversarial Patches with Visual Relations | Apr 11, 2023 | Image CaptioningObject Recognition | —Unverified | 0 |
| CAVL: Learning Contrastive and Adaptive Representations of Vision and Language | Apr 10, 2023 | Image RetrievalPhrase Grounding | —Unverified | 0 |
| Multilingual Augmentation for Robust Visual Question Answering in Remote Sensing Images | Apr 7, 2023 | Contrastive LearningQuestion Answering | —Unverified | 0 |
| Improving Visual Question Answering Models through Robustness Analysis and In-Context Learning with a Chain of Basic Questions | Apr 6, 2023 | In-Context LearningQuestion Answering | —Unverified | 0 |
| SC-ML: Self-supervised Counterfactual Metric Learning for Debiased Visual Question Answering | Apr 4, 2023 | counterfactualMetric Learning | —Unverified | 0 |
| Q2ATransformer: Improving Medical VQA via an Answer Querying Decoder | Apr 4, 2023 | ClassificationDecoder | —Unverified | 0 |
| Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA | Apr 4, 2023 | Answer GenerationLanguage Modelling | —Unverified | 0 |
| Instance-Level Trojan Attacks on Visual Question Answering via Adversarial Learning in Neuron Activation Space | Apr 2, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |