| Reverse Image Retrieval Cues Parametric Memory in Multimodal LLMs | May 29, 2024 | Image RetrievalQuestion Answering | CodeCode Available | 1 |
| PitVQA: Image-grounded Text Embedding LLM for Visual Question Answering in Pituitary Surgery | May 22, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| UniRAG: Universal Retrieval Augmentation for Large Vision Language Models | May 16, 2024 | Image CaptioningImage Generation | CodeCode Available | 1 |
| Understanding Figurative Meaning through Explainable Visual Entailment | May 2, 2024 | Question AnsweringVisual Entailment | CodeCode Available | 1 |
| TableVQA-Bench: A Visual Question Answering Benchmark on Multiple Table Domains | Apr 30, 2024 | Language ModellingLarge Language Model | CodeCode Available | 1 |
| ViOCRVQA: Novel Benchmark Dataset and Vision Reader for Visual Question Answering by Understanding Vietnamese Text in Images | Apr 29, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 1 |
| LaPA: Latent Prompt Assist Model For Medical Visual Question Answering | Apr 19, 2024 | Medical Visual Question AnsweringQuestion Answering | CodeCode Available | 1 |
| Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering | Apr 18, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | CodeCode Available | 1 |
| Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts | Apr 12, 2024 | Image CaptioningQuestion Answering | CodeCode Available | 1 |
| CausalChaos! Dataset for Comprehensive Causal Action Question Answering Over Longer Causal Chains Grounded in Dynamic Visual Scenes | Apr 1, 2024 | Causal DiscoveryCausal Discovery in Video Reasoning | CodeCode Available | 1 |