| Compound Tokens: Channel Fusion for Vision-Language Representation Learning | Dec 2, 2022 | DecoderLanguage Modeling | —Unverified | 0 |
| Optimizing Explanations by Network Canonization and Hyperparameter Search | Nov 30, 2022 | Explainable Artificial Intelligence (XAI)image-classification | —Unverified | 0 |
| PiggyBack: Pretrained Visual Question Answering Environment for Backing up Non-deep Learning Professionals | Nov 29, 2022 | Deep LearningQuestion Answering | —Unverified | 0 |
| Neuro-Symbolic Spatio-Temporal Reasoning | Nov 28, 2022 | AI AgentImage Segmentation | —Unverified | 0 |
| Look, Read and Ask: Learning to Ask Questions by Reading Text in Images | Nov 23, 2022 | Optical Character Recognition (OCR)Question Answering | —Unverified | 0 |
| Cross-Modal Contrastive Learning for Robust Reasoning in VQA | Nov 21, 2022 | Contrastive LearningQuestion Answering | CodeCode Available | 0 |
| CL-CrossVQA: A Continual Learning Benchmark for Cross-Domain Visual Question Answering | Nov 19, 2022 | Continual LearningQuestion Answering | —Unverified | 0 |
| Text-Aware Dual Routing Network for Visual Question Answering | Nov 17, 2022 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| AlignVE: Visual Entailment Recognition Based on Alignment Relations | Nov 16, 2022 | Question AnsweringRelation | —Unverified | 0 |
| Visually Grounded VQA by Lattice-based Retrieval | Nov 15, 2022 | Information RetrievalQuestion Answering | CodeCode Available | 0 |
| MF2-MVQA: A Multi-stage Feature Fusion method for Medical Visual Question Answering | Nov 11, 2022 | Medical Visual Question AnsweringQuestion Answering | —Unverified | 0 |
| Towards Reasoning-Aware Explainable VQA | Nov 9, 2022 | DecoderExplanation Generation | —Unverified | 0 |
| ERNIE-UniX2: A Unified Cross-lingual Cross-modal Framework for Understanding and Generation | Nov 9, 2022 | Contrastive LearningDecoder | —Unverified | 0 |
| Generalization Differences between End-to-End and Neuro-Symbolic Vision-Language Reasoning Systems | Oct 26, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Compressing And Debiasing Vision-Language Pre-Trained Models for Visual Question Answering | Oct 26, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| What's Different between Visual Question Answering for Machine "Understanding" Versus for Accessibility? | Oct 26, 2022 | BenchmarkingQuestion Answering | CodeCode Available | 0 |
| Learning by Hallucinating: Vision-Language Pre-training with Weak Supervision | Oct 24, 2022 | cross-modal alignmentCross-Modal Retrieval | —Unverified | 0 |
| RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing Data | Oct 23, 2022 | Image CaptioningImage-text Retrieval | —Unverified | 0 |
| Image Semantic Relation Generation | Oct 19, 2022 | Image RetrievalImage Segmentation | —Unverified | 0 |
| CPL: Counterfactual Prompt Learning for Vision and Language Models | Oct 19, 2022 | counterfactualimage-classification | —Unverified | 0 |
| Aligning MAGMA by Few-Shot Learning and Finetuning | Oct 18, 2022 | Few-Shot LearningImage Captioning | —Unverified | 0 |
| Entity-Focused Dense Passage Retrieval for Outside-Knowledge Visual Question Answering | Oct 18, 2022 | Passage RetrievalQuestion Answering | —Unverified | 0 |
| Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training | Oct 17, 2022 | Image CaptioningNetwork Interpretation | CodeCode Available | 0 |
| Multi-Modal Fusion Transformer for Visual Question Answering in Remote Sensing | Oct 10, 2022 | Question AnsweringRepresentation Learning | —Unverified | 0 |
| MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning | Oct 9, 2022 | Image-text Retrievalmultimodal interaction | —Unverified | 0 |