| Modality-Aware Integration with Large Language Models for Knowledge-based Visual Question Answering | Feb 20, 2024 | Knowledge GraphsQuestion Answering | —Unverified | 0 |
| Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models | Feb 19, 2024 | Image CaptioningQuestion Answering | —Unverified | 0 |
| Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning | Feb 18, 2024 | HallucinationVisual Question Answering | —Unverified | 0 |
| Aligning Modalities in Vision Large Language Models via Preference Fine-tuning | Feb 18, 2024 | HallucinationInstruction Following | CodeCode Available | 2 |
| ALLaVA: Harnessing GPT4V-Synthesized Data for Lite Vision-Language Models | Feb 18, 2024 | Language ModellingQuestion Answering | CodeCode Available | 3 |
| CoLLaVO: Crayon Large Language and Vision mOdel | Feb 17, 2024 | Large Language Modelmodel | CodeCode Available | 2 |
| VQAttack: Transferable Adversarial Attacks on Visual Question Answering via Pre-trained Models | Feb 16, 2024 | Adversarial RobustnessLanguage Modelling | —Unverified | 0 |
| II-MMR: Identifying and Improving Multi-modal Multi-hop Reasoning in Visual Question Answering | Feb 16, 2024 | Question AnsweringTriplet | CodeCode Available | 0 |
| PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter | Feb 16, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Multi-modal Preference Alignment Remedies Degradation of Visual Instruction Tuning on Language Models | Feb 16, 2024 | DiversityInstruction Following | CodeCode Available | 1 |
| Prompt-based Personalized Federated Learning for Medical Visual Question Answering | Feb 15, 2024 | Federated LearningMedical Visual Question Answering | —Unverified | 0 |
| Pretraining Vision-Language Model for Difference Visual Question Answering in Longitudinal Chest X-rays | Feb 14, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM | Feb 14, 2024 | Medical Visual Question AnsweringQuestion Answering | CodeCode Available | 4 |
| Learning How To Ask: Cycle-Consistency Refines Prompts in Multimodal Foundation Models | Feb 13, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| PreFLMR: Scaling Up Fine-Grained Late-Interaction Multi-modal Retrievers | Feb 13, 2024 | Question AnsweringRetrieval | CodeCode Available | 3 |
| Visually Dehallucinative Instruction Generation | Feb 13, 2024 | HallucinationLanguage Modeling | CodeCode Available | 0 |
| Visual Question Answering Instruction: Unlocking Multimodal Large Language Model To Domain-Specific Visual Multitasks | Feb 13, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs | Feb 12, 2024 | Instruction FollowingLogical Reasoning | —Unverified | 0 |
| Synthesizing Sentiment-Controlled Feedback For Multimodal Text and Image Data | Feb 12, 2024 | DecoderMarketing | CodeCode Available | 0 |
| Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models | Feb 12, 2024 | HallucinationObject Localization | CodeCode Available | 4 |
| Q-Bench+: A Benchmark for Multi-modal Foundation Models on Low-level Vision from Single Images to Pairs | Feb 11, 2024 | Image Quality AssessmentQuestion Answering | CodeCode Available | 3 |
| Open-ended VQA benchmarking of Vision-Language models by exploiting Classification datasets and their semantic hierarchy | Feb 11, 2024 | Language ModelingOpen Vocabulary Attribute Detection | CodeCode Available | 1 |
| Gemini Goes to Med School: Exploring the Capabilities of Multimodal Large Language Models on Medical Challenge Problems & Hallucinations | Feb 10, 2024 | DiagnosticHallucination | CodeCode Available | 1 |
| Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey | Feb 8, 2024 | ArticlesEntity Alignment | CodeCode Available | 3 |
| CIC: A Framework for Culturally-Aware Image Captioning | Feb 8, 2024 | DescriptiveImage Captioning | —Unverified | 0 |