| InfMLLM: A Unified Framework for Visual-Language Tasks | Nov 12, 2023 | GPUImage Captioning | CodeCode Available | 1 | 5 |
| InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4 | Aug 23, 2023 | Instruction FollowingQuestion Answering | CodeCode Available | 1 | 5 |
| Don't Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases | Sep 9, 2019 | Natural Language InferenceQuestion Answering | CodeCode Available | 1 | 5 |
| Does Vision-and-Language Pretraining Improve Lexical Grounding? | Sep 21, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 1 | 5 |
| A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports | Sep 3, 2020 | Image-text RetrievalMedical Visual Question Answering | CodeCode Available | 1 | 5 |
| Instruction-Guided Visual Masking | May 30, 2024 | Instruction FollowingVisual Grounding | CodeCode Available | 1 | 5 |
| DocVQA: A Dataset for VQA on Document Images | Jul 1, 2020 | Question AnsweringReading Comprehension | CodeCode Available | 1 | 5 |
| Improving Selective Visual Question Answering by Learning from Your Peers | Jun 14, 2023 | Question AnsweringVisual Question Answering | CodeCode Available | 1 | 5 |
| IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models | Mar 23, 2024 | Common Sense ReasoningIn-Context Learning | CodeCode Available | 1 | 5 |
| Distilled Dual-Encoder Model for Vision-Language Understanding | Dec 16, 2021 | Image to textmodel | CodeCode Available | 1 | 5 |
| IMPACT: A Large-scale Integrated Multimodal Patent Analysis and Creation Dataset for Design Patents | Dec 10, 2024 | Cross-Modal RetrievalImage Classification | CodeCode Available | 1 | 5 |
| In Defense of Grid Features for Visual Question Answering | Jan 10, 2020 | Image CaptioningQuestion Answering | CodeCode Available | 1 | 5 |
| Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering | Dec 14, 2021 | Graph MatchingQuestion Answering | CodeCode Available | 1 | 5 |
| I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision | Nov 17, 2022 | Image CaptioningQuestion Answering | CodeCode Available | 1 | 5 |
| Lever LM: Configuring In-Context Sequence to Lever Large Vision Language Models | Dec 15, 2023 | Image CaptioningIn-Context Learning | CodeCode Available | 1 | 5 |
| Boosting Audio Visual Question Answering via Key Semantic-Aware Cues | Jul 30, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | CodeCode Available | 1 | 5 |
| Answer Mining from a Pool of Images: Towards Retrieval-Based Visual Question Answering | Jun 29, 2023 | Answer GenerationQuestion Answering | CodeCode Available | 1 | 5 |
| HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models | Mar 20, 2024 | MMEVisual Question Answering | CodeCode Available | 1 | 5 |
| Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering | Jul 25, 2017 | Image CaptioningVisual Question Answering | CodeCode Available | 1 | 5 |
| Hypergraph Transformer: Weakly-supervised Multi-hop Reasoning for Knowledge-based Visual Question Answering | Apr 22, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 1 | 5 |
| I2I: Initializing Adapters with Improvised Knowledge | Apr 4, 2023 | Continual LearningQuestion Answering | CodeCode Available | 1 | 5 |
| IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning | Oct 25, 2021 | Arithmetic ReasoningMathematical Question Answering | CodeCode Available | 1 | 5 |
| Disentangling 3D Prototypical Networks For Few-Shot Concept Learning | Nov 6, 2020 | 3D geometry3D Object Detection | CodeCode Available | 1 | 5 |
| AntifakePrompt: Prompt-Tuned Vision-Language Models are Fake Image Detectors | Oct 26, 2023 | DeepFake DetectionFace Swapping | CodeCode Available | 1 | 5 |
| How Much Can CLIP Benefit Vision-and-Language Tasks? | Jul 13, 2021 | Question AnsweringVision and Language Navigation | CodeCode Available | 1 | 5 |