| I2I: Initializing Adapters with Improvised Knowledge | Apr 4, 2023 | Continual LearningQuestion Answering | CodeCode Available | 1 |
| How to Configure Good In-Context Sequence for Visual Question Answering | Dec 4, 2023 | In-Context LearningQuestion Answering | CodeCode Available | 1 |
| Attention in Reasoning: Dataset, Analysis, and Modeling | Apr 20, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs | Dec 4, 2024 | Visual Question Answering | CodeCode Available | 1 |
| Label-Descriptive Patterns and Their Application to Characterizing Classification Errors | Oct 18, 2021 | Descriptivenamed-entity-recognition | CodeCode Available | 1 |
| LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection | Jul 26, 2022 | DecoderKnowledge Graphs | CodeCode Available | 1 |
| Language-Informed Visual Concept Learning | Dec 6, 2023 | DisentanglementNovel Concepts | CodeCode Available | 1 |
| Language Prior Is Not the Only Shortcut: A Benchmark for Shortcut Learning in VQA | Oct 10, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| ConceptBert: Concept-Aware Representation for Visual Question Answering | Nov 1, 2020 | Common Sense ReasoningQuestion Answering | CodeCode Available | 1 |
| Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts | Feb 17, 2021 | Caption GenerationDiversity | CodeCode Available | 1 |
| BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs | Mar 2, 2023 | ArticlesMedical Visual Question Answering | CodeCode Available | 1 |
| Large Scale Multimodal Classification Using an Ensemble of Transformer Models and Co-Attention | Nov 23, 2020 | ClassificationGeneral Classification | CodeCode Available | 1 |
| CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation | Jul 1, 2024 | Image-text RetrievalQuestion Answering | CodeCode Available | 1 |
| Consistency-preserving Visual Question Answering in Medical Imaging | Jun 27, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| Hypergraph Transformer: Weakly-supervised Multi-hop Reasoning for Knowledge-based Visual Question Answering | Apr 22, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| ConTEXTual Net: A Multimodal Vision-Language Model for Segmentation of Pneumothorax | Mar 2, 2023 | DescriptiveImage Captioning | CodeCode Available | 1 |
| I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision | Nov 17, 2022 | Image CaptioningQuestion Answering | CodeCode Available | 1 |
| Cross-modal Retrieval for Knowledge-based Visual Question Answering | Jan 11, 2024 | Cross-Modal RetrievalQuestion Answering | CodeCode Available | 1 |
| Contrast and Classify: Training Robust VQA Models | Oct 13, 2020 | Contrastive LearningData Augmentation | CodeCode Available | 1 |
| A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA | Jun 30, 2022 | Question AnsweringRetrieval | CodeCode Available | 1 |
| A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration | Mar 25, 2022 | image-classificationImage Classification | CodeCode Available | 1 |
| CLIP-Guided Vision-Language Pre-training for Question Answering in 3D Scenes | Apr 12, 2023 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| Cross-modal Information Flow in Multimodal Large Language Models | Nov 27, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| How Much Can CLIP Benefit Vision-and-Language Tasks? | Jul 13, 2021 | Question AnsweringVision and Language Navigation | CodeCode Available | 1 |
| Lever LM: Configuring In-Context Sequence to Lever Large Vision Language Models | Dec 15, 2023 | Image CaptioningIn-Context Learning | CodeCode Available | 1 |