| The Illusion of Competence: Evaluating the Effect of Explanations on Users' Mental Models of Visual Question Answering Systems | Jun 27, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Enhancing Continual Learning in Visual Question Answering with Modality-Aware Feature Distillation | Jun 27, 2024 | Continual LearningQuestion Answering | CodeCode Available | 0 |
| Evaluating Fairness in Large Vision-Language Models Across Diverse Demographic Attributes and Prompts | Jun 25, 2024 | FairnessQuestion Answering | CodeCode Available | 0 |
| MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning | Jun 25, 2024 | ObjectObject Recognition | CodeCode Available | 2 |
| Claude 3.5 Sonnet Model Card Addendum | Jun 24, 2024 | Code GenerationMMR total | —Unverified | 0 |
| MM-SpuBench: Towards Better Understanding of Spurious Biases in Multimodal LLMs | Jun 24, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| GPT-4V Explorations: Mining Autonomous Driving | Jun 24, 2024 | Autonomous DrivingDecision Making | —Unverified | 0 |
| MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception | Jun 22, 2024 | Common Sense ReasoningLanguage Modelling | —Unverified | 0 |
| Tri-VQA: Triangular Reasoning Medical Visual Question Answering for Multi-Attribute Analysis | Jun 21, 2024 | AttributeMedical Visual Question Answering | —Unverified | 0 |
| Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models? | Jun 20, 2024 | Caption GenerationHallucination | —Unverified | 0 |
| LIVE: Learnable In-Context Vector for Visual Question Answering | Jun 19, 2024 | In-Context LearningQuestion Answering | CodeCode Available | 1 |
| Enhancing Cross-Prompt Transferability in Vision-Language Models through Contextual Injection of Target Tokens | Jun 19, 2024 | Caption Generationimage-classification | CodeCode Available | 0 |
| Diversify, Rationalize, and Combine: Ensembling Multiple QA Strategies for Zero-shot Knowledge-based VQA | Jun 18, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding | Jun 18, 2024 | Image CaptioningQuestion Answering | CodeCode Available | 2 |
| TroL: Traversal of Layers for Large Language and Vision Models | Jun 18, 2024 | Visual Question Answering | CodeCode Available | 2 |
| MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model | Jun 17, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning | Jun 17, 2024 | Image CaptioningQuestion Answering | —Unverified | 0 |
| MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models | Jun 17, 2024 | BenchmarkingFact Checking | CodeCode Available | 1 |
| MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs | Jun 17, 2024 | Visual Question Answering | CodeCode Available | 2 |
| Program Synthesis Benchmark for Visual Programming in XLogoOnline Environment | Jun 17, 2024 | Logical ReasoningMath | —Unverified | 0 |
| Mixture-of-Subspaces in Low-Rank Adaptation | Jun 16, 2024 | Common Sense ReasoningImage Generation | CodeCode Available | 0 |
| Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model | Jun 15, 2024 | Question AnsweringVideo Understanding | CodeCode Available | 0 |
| VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs | Jun 14, 2024 | Anomaly DetectionBenchmarking | CodeCode Available | 1 |
| Precision Empowers, Excess Distracts: Visual Question Answering With Dynamically Infused Knowledge In Language Models | Jun 14, 2024 | DecoderKnowledge Graphs | —Unverified | 0 |
| Detecting and Evaluating Medical Hallucinations in Large Vision Language Models | Jun 14, 2024 | HallucinationMedical Visual Question Answering | —Unverified | 0 |