| Understanding Figurative Meaning through Explainable Visual Entailment | May 2, 2024 | Question AnsweringVisual Entailment | CodeCode Available | 1 |
| Beyond Human Vision: The Role of Large Vision Language Models in Microscope Image Analysis | May 1, 2024 | Image CaptioningQuestion Answering | —Unverified | 0 |
| CREPE: Coordinate-Aware End-to-End Document Parser | May 1, 2024 | document understandingOptical Character Recognition (OCR) | —Unverified | 0 |
| Enhanced Textual Feature Extraction for Visual Question Answering: A Simple Convolutional Approach | May 1, 2024 | Computational EfficiencyQuestion Answering | —Unverified | 0 |
| TableVQA-Bench: A Visual Question Answering Benchmark on Multiple Table Domains | Apr 30, 2024 | Language ModellingLarge Language Model | CodeCode Available | 1 |
| Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism | Apr 29, 2024 | document understandingGPU | CodeCode Available | 0 |
| ViOCRVQA: Novel Benchmark Dataset and Vision Reader for Visual Question Answering by Understanding Vietnamese Text in Images | Apr 29, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 1 |
| List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs | Apr 25, 2024 | Visual GroundingVisual Question Answering | CodeCode Available | 2 |
| How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites | Apr 25, 2024 | 4kLanguage Modeling | —Unverified | 0 |
| Efficiency in Focus: LayerNorm as a Catalyst for Fine-tuning Medical Visual Language Pre-trained Models | Apr 25, 2024 | Medical Visual Question Answeringparameter-efficient fine-tuning | —Unverified | 0 |
| Fusion of Domain-Adapted Vision and Language Models for Medical Visual Question Answering | Apr 24, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs | Apr 23, 2024 | Question AnsweringRetrieval | —Unverified | 0 |
| GSCo: Towards Generalizable AI in Medicine via Generalist-Specialist Collaboration | Apr 23, 2024 | Collaborative InferenceIn-Context Learning | CodeCode Available | 2 |
| Grounded Knowledge-Enhanced Medical VLP for Chest X-Ray | Apr 23, 2024 | Medical Visual Question AnsweringQuestion Answering | —Unverified | 0 |
| WangLab at MEDIQA-M3G 2024: Multimodal Medical Answer Generation using Large Language Models | Apr 22, 2024 | Answer Generationimage-classification | —Unverified | 0 |
| Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question Answering | Apr 22, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Lost in Space: Probing Fine-grained Spatial Understanding in Vision and Language Resamplers | Apr 21, 2024 | DiagnosticImage Captioning | CodeCode Available | 0 |
| Exploring Diverse Methods in Visual Question Answering | Apr 21, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| LaPA: Latent Prompt Assist Model For Medical Visual Question Answering | Apr 19, 2024 | Medical Visual Question AnsweringQuestion Answering | CodeCode Available | 1 |
| PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering | Apr 19, 2024 | ArticlesInformation Retrieval | —Unverified | 0 |
| Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning | Apr 19, 2024 | Benchmarkingcounterfactual | —Unverified | 0 |
| TextSquare: Scaling up Text-Centric Visual Instruction Tuning | Apr 19, 2024 | HallucinationHallucination Evaluation | —Unverified | 0 |
| Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering | Apr 18, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | CodeCode Available | 1 |
| MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale | Apr 18, 2024 | Decision MakingMedical Visual Question Answering | —Unverified | 0 |
| Self-Supervised Visual Preference Alignment | Apr 16, 2024 | 8kMM-Vet | CodeCode Available | 2 |