| FEDMEKI: A Benchmark for Scaling Medical Foundation Models via Federated Knowledge Injection | Aug 17, 2024 | Federated LearningMedical Visual Question Answering | CodeCode Available | 0 |
| Federated Document Visual Question Answering: A Pilot Study | May 10, 2024 | Federated LearningQuestion Answering | CodeCode Available | 0 |
| OG-SGG: Ontology-Guided Scene Graph Generation. A Case Study in Transfer Learning for Telepresence Robotics | Feb 21, 2022 | BIG-bench Machine LearningGraph Generation | CodeCode Available | 0 |
| Core Tokensets for Data-efficient Sequential Training of Transformers | Oct 8, 2024 | Image Captioningimage-classification | CodeCode Available | 0 |
| Copy-Move Forgery Detection and Question Answering for Remote Sensing Image | Dec 3, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism | Apr 29, 2024 | document understandingGPU | CodeCode Available | 0 |
| OmniFusion Technical Report | Apr 9, 2024 | MM-VetTextVQA | CodeCode Available | 0 |
| Multimodal Residual Learning for Visual QA | Jun 5, 2016 | Multiple-choiceQuestion Answering | CodeCode Available | 0 |
| OmniNet: A unified architecture for multi-modal multi-task learning | Jul 17, 2019 | Image CaptioningMulti-Task Learning | CodeCode Available | 0 |
| Convincing Rationales for Visual Question Answering Reasoning | Feb 6, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| AdaVQA: Overcoming Language Priors with Adapted Margin Cosine Loss | May 5, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Multimodal Preference Data Synthetic Alignment with Reward Model | Dec 23, 2024 | 2kCaption Generation | CodeCode Available | 0 |
| Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond | Oct 8, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Visual Text Matters: Improving Text-KVQA with Visual Text Entity Knowledge-aware Large Multimodal Assistant | Oct 24, 2024 | Entity LinkingQuestion Answering | CodeCode Available | 0 |
| Factor Graph Attention | Apr 11, 2019 | Graph AttentionQuestion Answering | CodeCode Available | 0 |
| Continual VQA for Disaster Response Systems | Sep 21, 2022 | Disaster ResponseManagement | CodeCode Available | 0 |
| On Modality Bias Recognition and Reduction | Feb 25, 2022 | Action RecognitionMulti-modal Classification | CodeCode Available | 0 |
| Exploring the Effect of Primitives for Compositional Generalization in Vision-and-Language | Jan 1, 2023 | Question AnsweringSelf-Supervised Learning | CodeCode Available | 0 |
| Answering Diverse Questions via Text Attached with Key Audio-Visual Clues | Mar 11, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | CodeCode Available | 0 |
| Context-VQA: Towards Context-Aware and Purposeful Visual Question Answering | Jul 28, 2023 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Exploring Modulated Detection Transformer as a Tool for Action Recognition in Videos | Sep 21, 2022 | Action DetectionAction Recognition | CodeCode Available | 0 |
| Contextual Dropout: An Efficient Sample-Dependent Dropout Module | Mar 6, 2021 | image-classificationImage Classification | CodeCode Available | 0 |
| Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering | Aug 4, 2017 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Consistency of Compositional Generalization across Multiple Levels | Dec 18, 2024 | Meta-LearningQuestion Answering | CodeCode Available | 0 |
| What's Different between Visual Question Answering for Machine "Understanding" Versus for Accessibility? | Oct 26, 2022 | BenchmarkingQuestion Answering | CodeCode Available | 0 |