| Uncovering Bias in Large Vision-Language Models with Counterfactuals | Mar 29, 2024 | counterfactualQuestion Answering | —Unverified | 0 |
| VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis | Mar 29, 2024 | HallucinationImage Captioning | CodeCode Available | 2 |
| Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models | Mar 29, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 2 |
| JDocQA: Japanese Document Question Answering Dataset for Generative Language Models | Mar 28, 2024 | HallucinationQuestion Answering | CodeCode Available | 1 |
| Multi-Frame, Lightweight & Efficient Vision-Language Models for Question Answering in Autonomous Driving | Mar 28, 2024 | Autonomous DrivingLanguage Modeling | CodeCode Available | 2 |
| Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models | Mar 27, 2024 | Image ClassificationImage Comprehension | CodeCode Available | 7 |
| Beyond Embeddings: The Promise of Visual Table in Visual Reasoning | Mar 27, 2024 | Representation LearningVisual Question Answering | CodeCode Available | 1 |
| Quantifying and Mitigating Unimodal Biases in Multimodal Large Language Models: A Causal Perspective | Mar 27, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| Intrinsic Subgraph Generation for Interpretable Graph based Visual Question Answering | Mar 26, 2024 | Decision MakingExplainable artificial intelligence | CodeCode Available | 0 |
| Visual Hallucination: Definition, Quantification, and Prescriptive Remediations | Mar 26, 2024 | HallucinationImage Captioning | —Unverified | 0 |
| A Gaze-grounded Visual Question Answering Dataset for Clarifying Ambiguous Japanese Questions | Mar 26, 2024 | Gaze Target EstimationQuestion Answering | —Unverified | 0 |
| PropTest: Automatic Property Testing for Improved Visual Programming | Mar 25, 2024 | Question AnsweringReferring Expression | —Unverified | 0 |
| Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA | Mar 25, 2024 | Chart Question AnsweringData Augmentation | —Unverified | 0 |
| IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models | Mar 23, 2024 | Common Sense ReasoningIn-Context Learning | CodeCode Available | 1 |
| Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery | Mar 22, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| MedPromptX: Grounded Multimodal Prompting for Chest X-ray Diagnosis | Mar 22, 2024 | Medical DiagnosisMedical Visual Question Answering | CodeCode Available | 2 |
| LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models | Mar 22, 2024 | Language ModellingLarge Language Model | CodeCode Available | 2 |
| Multi-Agent VQA: Exploring Multi-Agent Foundation Models in Zero-Shot Visual Question Answering | Mar 21, 2024 | object-detectionObject Detection | CodeCode Available | 1 |
| Language Repository for Long Video Understanding | Mar 21, 2024 | EgoSchemaQuestion Answering | CodeCode Available | 1 |
| MyVLM: Personalizing VLMs for User-Specific Queries | Mar 21, 2024 | Image CaptioningLanguage Modelling | —Unverified | 0 |
| VL-Mamba: Exploring State Space Models for Multimodal Learning | Mar 20, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Improved Baselines for Data-efficient Perceptual Augmentation of LLMs | Mar 20, 2024 | Audio captioningImage Captioning | —Unverified | 0 |
| HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models | Mar 20, 2024 | MMEVisual Question Answering | CodeCode Available | 1 |
| WoLF: Wide-scope Large Language Model Framework for CXR Understanding | Mar 19, 2024 | AnatomyInstruction Following | —Unverified | 0 |
| VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning | Mar 19, 2024 | BenchmarkingImage Captioning | CodeCode Available | 2 |