| CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks | Jan 15, 2022 | Question AnsweringVisual Commonsense Reasoning | —Unverified | 0 |
| A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering | Jan 14, 2022 | Generative Question AnsweringImage to text | —Unverified | 0 |
| Towards Automated Error Analysis: Learning to Characterize Errors | Jan 13, 2022 | Common Sense ReasoningMeta-Learning | —Unverified | 0 |
| On the Efficacy of Co-Attention Transformer Layers in Visual Question Answering | Jan 11, 2022 | POSQuestion Answering | —Unverified | 0 |
| Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training | Jan 11, 2022 | DecoderImage Captioning | —Unverified | 0 |
| COIN: Counterfactual Image Generation for VQA Interpretation | Jan 10, 2022 | counterfactualImage Generation | —Unverified | 0 |
| Interactive Attention AI to translate low light photos to captions for night scene understanding in women safety | Jan 4, 2022 | DecoderDeep Learning | —Unverified | 0 |
| Transform-Retrieve-Generate: Natural Language-Centric Outside-Knowledge Visual Question Answering | Jan 1, 2022 | Generative Question AnsweringImage to text | —Unverified | 0 |
| Maintaining Reasoning Consistency in Compositional Visual Question Answering | Jan 1, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture | Jan 1, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| V-Doc: Visual Questions Answers With Documents | Jan 1, 2022 | Question AnsweringQuestion Generation | —Unverified | 0 |
| Query and Attention Augmentation for Knowledge-Based Explainable Reasoning | Jan 1, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Does CLIP Benefit Visual Question Answering in the Medical Domain as Much as it Does in the General Domain? | Dec 27, 2021 | ArticlesMedical Visual Question Answering | —Unverified | 0 |
| Multi-Image Visual Question Answering | Dec 27, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| LaTr: Layout-Aware Transformer for Scene-Text VQA | Dec 23, 2021 | Optical Character Recognition (OCR)Question Answering | CodeCode Available | 1 |
| Comprehensive Visual Question Answering on Point Clouds through Compositional Scene Manipulation | Dec 22, 2021 | Common Sense ReasoningQuestion Answering | CodeCode Available | 1 |
| General Greedy De-bias Learning | Dec 20, 2021 | image-classificationImage Classification | CodeCode Available | 0 |
| Task-Oriented Multi-User Semantic Communications | Dec 19, 2021 | Image RetrievalMachine Translation | —Unverified | 0 |
| Understanding Attention for Vision-and-Language Tasks | Dec 17, 2021 | Image GenerationImage Retrieval | —Unverified | 0 |
| Distilled Dual-Encoder Model for Vision-Language Understanding | Dec 16, 2021 | Image to textmodel | CodeCode Available | 1 |
| 3D Question Answering | Dec 15, 2021 | 3D geometryQuestion Answering | —Unverified | 0 |
| Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering | Dec 14, 2021 | Graph MatchingQuestion Answering | CodeCode Available | 1 |
| Dual-Key Multimodal Backdoors for Visual Question Answering | Dec 14, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| Improving and Diagnosing Knowledge-Based Visual Question Answering via Entity Enhanced Knowledge Injection | Dec 13, 2021 | Common Sense ReasoningKnowledge Graph Embeddings | —Unverified | 0 |
| Change Detection Meets Visual Question Answering | Dec 12, 2021 | Answer GenerationChange Detection | CodeCode Available | 1 |