Does my multimodal model learn cross-modal interactions? It's harder to tell than you might think! Oct 13, 2020 Diagnostic Image-text Classification
— Unverified 0AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations? Dec 4, 2024 Benchmarking Visual Question Answering (VQA)
— Unverified 0Document Visual Question Answering Challenge 2020 Aug 20, 2020 Question Answering Retrieval
— Unverified 0An Empirical Study on the Language Modal in Visual Question Answering May 17, 2023 Question Answering Visual Question Answering
— Unverified 0Document Collection Visual Question Answering Apr 27, 2021 document understanding Question Answering
— Unverified 0A Systematic Evaluation of GPT-4V's Multimodal Capability for Medical Image Analysis Oct 31, 2023 Descriptive Medical Image Analysis
— Unverified 0Hyper-dimensional computing for a visual question-answering system that is trainable end-to-end Nov 28, 2017 Question Answering Visual Question Answering
— Unverified 0Hypo3D: Exploring Hypothetical Reasoning in 3D Feb 2, 2025 Question Answering Visual Question Answering
— Unverified 0Document AI: Benchmarks, Models and Applications Nov 16, 2021 Deep Learning Document AI
— Unverified 0An Empirical Study on the Generalization Power of Neural Representations Learned via Visual Guessing Games Jan 31, 2021 Question Answering Visual Question Answering
— Unverified 0Binding Touch to Everything: Learning Unified Multimodal Tactile Representations Jan 31, 2024 Question Answering Visual Question Answering (VQA)
— Unverified 0Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image Alignment with Iterative VQA Feedback Jul 10, 2023 Image Generation Visual Question Answering (VQA)
— Unverified 0A Comprehensive Evaluation of Multi-Modal Large Language Models for Endoscopy Analysis May 29, 2025 Diagnostic Visual Prompting
— Unverified 0HVS Revisited: A Comprehensive Video Quality Assessment Framework Oct 9, 2022 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 0Diversity and Consistency: Exploring Visual Question-Answer Pair Generation Nov 1, 2021 Diversity Question Answering
— Unverified 0Advancing Video Quality Assessment for AIGC Sep 23, 2024 Image Generation Text Generation
— Unverified 0Distraction-free Embeddings for Robust VQA Aug 31, 2023 Question Answering Video Question Answering
— Unverified 0Hummingbird: High Fidelity Image Generation via Multimodal Context Alignment Feb 7, 2025 Diversity Human-Object Interaction Detection
— Unverified 0Disentangling Knowledge-based and Visual Reasoning by Question Decomposition in KB-VQA Jun 27, 2024 General Knowledge Question Answering
— Unverified 0Beyond VQA: Generating Multi-word Answer and Rationale to Visual Questions Oct 24, 2020 General Classification Multiple-choice
— Unverified 0An Empirical Study on Leveraging Scene Graphs for Visual Question Answering Jul 28, 2019 Knowledge Graphs Question Answering
— Unverified 0Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions? Jun 17, 2016 Question Answering Visual Question Answering
— Unverified 0Directional Gradient Projection for Robust Fine-Tuning of Foundation Models Feb 21, 2025 image-classification Image Classification
— Unverified 0Beyond the Hype: A dispassionate look at vision-language models in medical scenario Aug 16, 2024 Question Answering Spatial Reasoning
— Unverified 0DiN: Diffusion Model for Robust Medical VQA with Semantic Noisy Labels Mar 24, 2025 Medical Visual Question Answering Question Answering
— Unverified 0DiffVQA: Video Quality Assessment Using Diffusion Feature Extractor May 6, 2025 Mamba Video Quality Assessment
— Unverified 0Advancing Surgical VQA with Scene Graph Knowledge Dec 15, 2023 Question Answering Visual Question Answering
— Unverified 0Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions? Jun 11, 2016 Question Answering Visual Question Answering
— Unverified 0Hyperbolic Attention Networks May 24, 2018 Machine Translation Question Answering
— Unverified 0How to find a good image-text embedding for remote sensing visual question answering? Sep 24, 2021 Question Answering Visual Question Answering
— Unverified 0How Transferable are Reasoning Patterns in VQA? Apr 8, 2021 Question Answering Visual Question Answering
— Unverified 0Advancing Multimodal Medical Capabilities of Gemini May 6, 2024 Computed Tomography (CT) image-classification
— Unverified 0How to Design Sample and Computationally Efficient VQA Models Mar 22, 2021 Question Answering Visual Question Answering
— Unverified 0How Well Can Vison-Language Models Understand Humans' Intention? An Open-ended Theory of Mind Question Evaluation Benchmark Mar 28, 2025 Question Answering Visual Question Answering
— Unverified 0Differentiable End-to-End Program Executor for Sample and Computationally Efficient VQA Jan 1, 2021 Question Answering Visual Question Answering
— Unverified 0DIEM: Decomposition-Integration Enhancing Multimodal Insights Jan 1, 2024 MM-Vet Question Answering
— Unverified 0Beyond Human Vision: The Role of Large Vision Language Models in Microscope Image Analysis May 1, 2024 Image Captioning Question Answering
— Unverified 0Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning Oct 8, 2024 Image Retrieval Math
— Unverified 0How (not) to ensemble LVLMs for VQA Oct 10, 2023 Retrieval Visual Question Answering (VQA)
— Unverified 0HRVQA: A Visual Question Answering Benchmark for High-Resolution Aerial Images Jan 23, 2023 Attribute Question Answering
— Unverified 0Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation Jan 18, 2024 Caption Generation Language Modeling
— Unverified 0Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions May 18, 2024 Visual Question Answering (VQA)
— Unverified 0Detect, Describe, Discriminate: Moving Beyond VQA for MLLM Evaluation Sep 23, 2024 Multiple-choice Question Answering
— Unverified 0BESTMVQA: A Benchmark Evaluation System for Medical Visual Question Answering Dec 13, 2023 Medical Visual Question Answering Question Answering
— Unverified 0Detect2Interact: Localizing Object Key Field in Visual Question Answering (VQA) with LLMs Apr 1, 2024 Common Sense Reasoning Object
— Unverified 0An Empirical Study of Batch Normalization and Group Normalization in Conditional Computation Jul 31, 2019 Conditional Image Generation Few-Shot Learning
— Unverified 0How good are deep models in understanding the generated images? Aug 23, 2022 Object Object Recognition
— Unverified 0How Much Can CLIP Benefit Vision-and-Language Tasks? Sep 29, 2021 Question Answering Visual Entailment
— Unverified 0DePlot: One-shot visual language reasoning by plot-to-table translation Dec 20, 2022 Chart Question Answering Factual Inconsistency Detection in Chart Captioning
— Unverified 0What BERT Sees: Cross-Modal Transfer for Visual Question Generation Feb 25, 2020 Question Generation Question-Generation
— Unverified 0