| Analysis of Visual Question Answering Algorithms with attention model | May 4, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought | May 24, 2023 | Image CaptioningLanguage Modelling | —Unverified | 0 | 0 |
| MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning | Sep 30, 2024 | Mixture-of-ExpertsOptical Character Recognition (OCR) | —Unverified | 0 | 0 |
| MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training | Mar 14, 2024 | In-Context LearningMixture-of-Experts | —Unverified | 0 | 0 |
| MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling | Oct 14, 2024 | DenoisingImage Generation | —Unverified | 0 | 0 |
| ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders | Aug 2, 2023 | Contrastive LearningQuestion Answering | —Unverified | 0 | 0 |
| Eliminating the Language Bias for Visual Question Answering with fine-grained Causal Intervention | Oct 14, 2024 | Contrastive Learningcounterfactual | —Unverified | 0 | 0 |
| MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models | Oct 13, 2024 | Cross-Modal RetrievalQuestion Answering | —Unverified | 0 | 0 |
| MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning | May 28, 2024 | Decision MakingVideo Understanding | —Unverified | 0 | 0 |
| Eliminating Catastrophic Interference with Biased Competition | Jul 3, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| MMED: A Multi-domain and Multi-modality Event Dataset | Apr 4, 2019 | ArticlesQuestion Answering | —Unverified | 0 | 0 |
| MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning | Nov 5, 2024 | MMEQuestion Answering | —Unverified | 0 | 0 |
| ElectroVizQA: How well do Multi-modal LLMs perform in Electronics Visual Question Answering? | Nov 27, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Efficient Multi-modal Large Language Models via Visual Token Grouping | Nov 26, 2024 | Image CaptioningQuestion Answering | —Unverified | 0 | 0 |
| EfficientLLaVA: Generalizable Auto-Pruning for Large Vision-language Models | Jan 1, 2025 | MM-VetMultimodal Reasoning | —Unverified | 0 | 0 |
| EfficientLLaVA:Generalizable Auto-Pruning for Large Vision-language Models | Mar 19, 2025 | MM-VetMultimodal Reasoning | —Unverified | 0 | 0 |
| MMIU: Dataset for Visual Intent Understanding in Multimodal Assistants | Oct 13, 2021 | intent-classificationIntent Classification | —Unverified | 0 | 0 |
| MMKB-RAG: A Multi-Modal Knowledge-Based Retrieval-Augmented Generation Framework | Apr 14, 2025 | Question AnsweringRAG | —Unverified | 0 | 0 |
| Efficient Few-Shot Continual Learning in Vision-Language Models | Feb 6, 2025 | Continual LearningImage Captioning | —Unverified | 0 | 0 |
| Where To Look: Focus Regions for Visual Question Answering | Nov 23, 2015 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference | Nov 15, 2024 | QuantizationQuestion Answering | —Unverified | 0 | 0 |
| MM-R^3: On (In-)Consistency of Multi-modal Large Language Models (MLLMs) | Oct 7, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Efficient Bilinear Attention-based Fusion for Medical Visual Question Answering | Oct 28, 2024 | Computational EfficiencyDecision Making | —Unverified | 0 | 0 |
| MM-SpuBench: Towards Better Understanding of Spurious Biases in Multimodal LLMs | Jun 24, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Efficiency in Focus: LayerNorm as a Catalyst for Fine-tuning Medical Visual Language Pre-trained Models | Apr 25, 2024 | Medical Visual Question Answeringparameter-efficient fine-tuning | —Unverified | 0 | 0 |