| Towards Multilingual Audio-Visual Question Answering | Jun 13, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | CodeCode Available | 0 |
| Right this way: Can VLMs Guide Us to See More to Answer Questions? | Nov 1, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Intrinsic Subgraph Generation for Interpretable Graph based Visual Question Answering | Mar 26, 2024 | Decision MakingExplainable artificial intelligence | CodeCode Available | 0 |
| Diffusion-Refined VQA Annotations for Semi-Supervised Gaze Following | Jun 4, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD | Apr 9, 2024 | 4kLanguage Modeling | CodeCode Available | 0 |
| An Entropy Clustering Approach for Assessing Visual Question Difficulty | Apr 12, 2020 | ClusteringQuestion Answering | CodeCode Available | 0 |
| Visual Contexts Clarify Ambiguous Expressions: A Benchmark Dataset | Nov 21, 2024 | Question AnsweringVisual Grounding | CodeCode Available | 0 |
| Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs | Oct 15, 2024 | Image DescriptionMultiple-choice | CodeCode Available | 0 |
| Visual Coreference Resolution in Visual Dialog using Neural Module Networks | Sep 6, 2018 | Common Sense Reasoningcoreference-resolution | CodeCode Available | 0 |
| BinaryVQA: A Versatile Test Set to Evaluate the Out-of-Distribution Generalization of VQA Models | Jan 28, 2023 | Out-of-Distribution GeneralizationQuestion Answering | CodeCode Available | 0 |
| A Simple Loss Function for Improving the Convergence and Accuracy of Visual Question Answering Models | Aug 2, 2017 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Robust Explanations for Visual Question Answering | Jan 23, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Integrating Image Features with Convolutional Sequence-to-sequence Network for Multilingual Visual Question Answering | Mar 22, 2023 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| A Simple Baseline for Knowledge-Based Visual Question Answering | Oct 20, 2023 | In-Context LearningQuestion Answering | CodeCode Available | 0 |
| Differential Attention for Visual Question Answering | Apr 1, 2018 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering | Dec 19, 2023 | Image RetrievalQuestion Answering | CodeCode Available | 0 |
| Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis | Feb 11, 2023 | Image-text RetrievalKnowledge Graphs | CodeCode Available | 0 |
| Instruction Makes a Difference | Feb 1, 2024 | HallucinationInstruction Following | CodeCode Available | 0 |
| Routing Networks and the Challenges of Modular and Compositional Computation | Apr 29, 2019 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question Answering | Oct 19, 2023 | Image CaptioningQuestion Answering | CodeCode Available | 0 |
| Incorporating Probing Signals into Multimodal Machine Translation via Visual Question-Answering Pairs | Oct 26, 2023 | AttributeMachine Translation | CodeCode Available | 0 |
| Did the Model Understand the Question? | May 14, 2018 | modelQuestion Answering | CodeCode Available | 0 |
| Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model | Jun 15, 2024 | Question AnsweringVideo Understanding | CodeCode Available | 0 |
| Improving Zero-shot Visual Question Answering via Large Language Models with Reasoning Question Prompts | Nov 15, 2023 | Question AnsweringSentence | CodeCode Available | 0 |
| Improving the Cross-Lingual Generalisation in Visual Question Answering | Sep 7, 2022 | Cross-Lingual TransferQuestion Answering | CodeCode Available | 0 |