| CommVQA: Situating Visual Question Answering in Communicative Contexts | Feb 22, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Modality-Aware Integration with Large Language Models for Knowledge-based Visual Question Answering | Feb 20, 2024 | Knowledge GraphsQuestion Answering | —Unverified | 0 |
| Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions | Feb 20, 2024 | Image CaptioningQuestion Answering | —Unverified | 0 |
| Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models | Feb 19, 2024 | Image CaptioningQuestion Answering | —Unverified | 0 |
| Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning | Feb 18, 2024 | HallucinationVisual Question Answering | —Unverified | 0 |
| II-MMR: Identifying and Improving Multi-modal Multi-hop Reasoning in Visual Question Answering | Feb 16, 2024 | Question AnsweringTriplet | CodeCode Available | 0 |
| PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter | Feb 16, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| VQAttack: Transferable Adversarial Attacks on Visual Question Answering via Pre-trained Models | Feb 16, 2024 | Adversarial RobustnessLanguage Modelling | —Unverified | 0 |
| Prompt-based Personalized Federated Learning for Medical Visual Question Answering | Feb 15, 2024 | Federated LearningMedical Visual Question Answering | —Unverified | 0 |
| Pretraining Vision-Language Model for Difference Visual Question Answering in Longitudinal Chest X-rays | Feb 14, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Visually Dehallucinative Instruction Generation | Feb 13, 2024 | HallucinationLanguage Modeling | CodeCode Available | 0 |
| Visual Question Answering Instruction: Unlocking Multimodal Large Language Model To Domain-Specific Visual Multitasks | Feb 13, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Learning How To Ask: Cycle-Consistency Refines Prompts in Multimodal Foundation Models | Feb 13, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| Synthesizing Sentiment-Controlled Feedback For Multimodal Text and Image Data | Feb 12, 2024 | DecoderMarketing | CodeCode Available | 0 |
| PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs | Feb 12, 2024 | Instruction FollowingLogical Reasoning | —Unverified | 0 |
| CIC: A Framework for Culturally-Aware Image Captioning | Feb 8, 2024 | DescriptiveImage Captioning | —Unverified | 0 |
| Examining Gender and Racial Bias in Large Vision-Language Models Using a Novel Dataset of Parallel Images | Feb 8, 2024 | Image CaptioningQuestion Answering | CodeCode Available | 0 |
| Convincing Rationales for Visual Question Answering Reasoning | Feb 6, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Knowledge Generation for Zero-shot Knowledge-based VQA | Feb 4, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Instruction Makes a Difference | Feb 1, 2024 | HallucinationInstruction Following | CodeCode Available | 0 |
| Can Generative AI Support Patients' & Caregivers' Informational Needs? Towards Task-Centric Evaluation Of AI Systems | Jan 31, 2024 | Computed Tomography (CT)Diagnostic | —Unverified | 0 |
| From Training-Free to Adaptive: Empirical Insights into MLLMs' Understanding of Detection Information | Jan 31, 2024 | Hallucinationobject-detection | —Unverified | 0 |
| InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model | Jan 29, 2024 | FormLanguage Modeling | —Unverified | 0 |
| Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA | Jan 29, 2024 | BenchmarkingImage Comprehension | —Unverified | 0 |
| LCV2: An Efficient Pretraining-Free Framework for Grounded Visual Question Answering | Jan 29, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |