| LIVE: Learnable In-Context Vector for Visual Question Answering | Jun 19, 2024 | In-Context LearningQuestion Answering | CodeCode Available | 1 |
| MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models | Jun 17, 2024 | BenchmarkingFact Checking | CodeCode Available | 1 |
| MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model | Jun 17, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs | Jun 14, 2024 | Anomaly DetectionBenchmarking | CodeCode Available | 1 |
| Vision-Language Models Meet Meteorology: Developing Models for Extreme Weather Events Detection with Heatmaps | Jun 14, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| Advancing High Resolution Vision-Language Models in Biomedicine | Jun 12, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text | Jun 10, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Re-ReST: Reflection-Reinforced Self-Training for Language Agents | Jun 3, 2024 | Code GenerationImage Generation | CodeCode Available | 1 |
| Instruction-Guided Visual Masking | May 30, 2024 | Instruction FollowingVisual Grounding | CodeCode Available | 1 |
| Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA | May 30, 2024 | DiagnosticMedical Diagnosis | CodeCode Available | 1 |
| Reverse Image Retrieval Cues Parametric Memory in Multimodal LLMs | May 29, 2024 | Image RetrievalQuestion Answering | CodeCode Available | 1 |
| PitVQA: Image-grounded Text Embedding LLM for Visual Question Answering in Pituitary Surgery | May 22, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| UniRAG: Universal Retrieval Augmentation for Large Vision Language Models | May 16, 2024 | Image CaptioningImage Generation | CodeCode Available | 1 |
| Understanding Figurative Meaning through Explainable Visual Entailment | May 2, 2024 | Question AnsweringVisual Entailment | CodeCode Available | 1 |
| TableVQA-Bench: A Visual Question Answering Benchmark on Multiple Table Domains | Apr 30, 2024 | Language ModellingLarge Language Model | CodeCode Available | 1 |
| ViOCRVQA: Novel Benchmark Dataset and Vision Reader for Visual Question Answering by Understanding Vietnamese Text in Images | Apr 29, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 1 |
| LaPA: Latent Prompt Assist Model For Medical Visual Question Answering | Apr 19, 2024 | Medical Visual Question AnsweringQuestion Answering | CodeCode Available | 1 |
| Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering | Apr 18, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | CodeCode Available | 1 |
| Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts | Apr 12, 2024 | Image CaptioningQuestion Answering | CodeCode Available | 1 |
| CausalChaos! Dataset for Comprehensive Causal Action Question Answering Over Longer Causal Chains Grounded in Dynamic Visual Scenes | Apr 1, 2024 | Causal DiscoveryCausal Discovery in Video Reasoning | CodeCode Available | 1 |
| JDocQA: Japanese Document Question Answering Dataset for Generative Language Models | Mar 28, 2024 | HallucinationQuestion Answering | CodeCode Available | 1 |
| Beyond Embeddings: The Promise of Visual Table in Visual Reasoning | Mar 27, 2024 | Representation LearningVisual Question Answering | CodeCode Available | 1 |
| Quantifying and Mitigating Unimodal Biases in Multimodal Large Language Models: A Causal Perspective | Mar 27, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models | Mar 23, 2024 | Common Sense ReasoningIn-Context Learning | CodeCode Available | 1 |
| Language Repository for Long Video Understanding | Mar 21, 2024 | EgoSchemaQuestion Answering | CodeCode Available | 1 |