Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Dec 6, 2024 document understanding Hallucination
— Unverified 00 Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language Models Mar 23, 2025 Question Answering Visual Question Answering
— Unverified 00 Explanation vs Attention: A Two-Player Game to Obtain Attention for VQA Nov 19, 2019 Question Answering Visual Question Answering
— Unverified 00 Explicit Bias Discovery in Visual Question Answering Models Nov 19, 2018 Question Answering Visual Question Answering
— Unverified 00 Explicit Knowledge-based Reasoning for Visual Question Answering Nov 9, 2015 Question Answering Visual Question Answering
— Unverified 00 Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering Mar 23, 2018 Question Answering Visual Question Answering
— Unverified 00 Explore before Moving: A Feasible Path Estimation and Memory Recalling Framework for Embodied Navigation Oct 16, 2021 Common Sense Reasoning Embodied Question Answering
— Unverified 00 Exploring Advanced Techniques for Visual Question Answering: A Comprehensive Comparison Feb 20, 2025 Diversity Language Modeling
— Unverified 00 Exploring Diverse Methods in Visual Question Answering Apr 21, 2024 Question Answering Visual Question Answering
— Unverified 00 Exploring Human-like Attention Supervision in Visual Question Answering Sep 19, 2017 Question Answering Visual Question Answering
— Unverified 00 Exploring Question Decomposition for Zero-Shot VQA Oct 25, 2023 Question Answering Visual Question Answering
— Unverified 00 Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA Oct 13, 2023 Graph Learning Object
— Unverified 00 Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models Jul 22, 2024 Question Answering Representation Learning
— Unverified 00 Exploring Weaknesses of VQA Models through Attribution Driven Insights Jun 11, 2020 Question Answering Visual Question Answering
— Unverified 00 Extending Class Activation Mapping Using Gaussian Receptive Field Jan 15, 2020 Deep Learning Image Classification
— Unverified 00 EKTVQA: Generalized use of External Knowledge to empower Scene Text in Text-VQA Aug 22, 2021 Open-Ended Question Answering Optical Character Recognition (OCR)
— Unverified 00 Extracting Training Data from Document-Based VQA Models Jul 11, 2024 Memorization Question Answering
— Unverified 00 EyeFound: A Multimodal Generalist Foundation Model for Ophthalmic Imaging May 18, 2024 Question Answering Visual Question Answering
— Unverified 00 Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning Apr 19, 2024 Benchmarking counterfactual
— Unverified 00 EyeSim-VQA: A Free-Energy-Guided Eye Simulation Framework for Video Quality Assessment Jun 13, 2025 Image Quality Assessment Video Quality Assessment
— Unverified 00 F^3OCUS -- Federated Finetuning of Vision-Language Foundation Models with Optimal Client Layer Updating Strategy via Multi-objective Meta-Heuristics Nov 17, 2024 Diversity Federated Learning
— Unverified 00 F^3OCUS - Federated Finetuning of Vision-Language Foundation Models with Optimal Client Layer Updating Strategy via Multi-objective Meta-Heuristics Jan 1, 2025 Diversity Federated Learning
— Unverified 00 FashionVQA: A Domain-Specific Visual Question Answering System Aug 24, 2022 Question Answering Visual Question Answering
— Unverified 00 Fast or Slow? Integrating Fast Intuition and Deliberate Thinking for Enhancing Visual Question Answering Jun 1, 2025 All MME
— Unverified 00 Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields Mar 26, 2025 Question Answering Visual Question Answering
— Unverified 00 Few-Shot Image Classification and Segmentation as Visual Question Answering Using Vision-Language Models Mar 15, 2024 Few-Shot Image Classification image-classification
— Unverified 00 Few-shot Multimodal Multitask Multilingual Learning Feb 19, 2023 Few-Shot Learning In-Context Learning
— Unverified 00 Few-Shot VQA with Frozen LLMs: A Tale of Two Approaches Mar 17, 2024 Image Captioning Question Answering
— Unverified 00 FilterRAG: Zero-Shot Informed Retrieval-Augmented Generation to Mitigate Hallucinations in VQA Feb 25, 2025 Question Answering Retrieval
— Unverified 00 Finding the Evidence: Localization-aware Answer Prediction for Text Visual Question Answering Oct 6, 2020 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 00 Find The Gap: Knowledge Base Reasoning For Visual Question Answering Apr 16, 2024 Question Answering Retrieval
— Unverified 00 Fine-Grained Retrieval-Augmented Generation for Visual Question Answering Feb 28, 2025 Question Answering RAG
— Unverified 00 Correlation Information Bottleneck: Towards Adapting Pretrained Multimodal Models for Robust Visual Question Answering Sep 14, 2022 Adversarial Robustness Question Answering
— Unverified 00 Fine-tuning vs From Scratch: Do Vision & Language Models Have Similar Capabilities on Out-of-Distribution Visual Question Answering? Jun 1, 2022 Question Answering Visual Question Answering
— Unverified 00 FineVQ: Fine-Grained User Generated Content Video Quality Assessment Dec 26, 2024 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 00 FlexCap: Describe Anything in Images in Controllable Detail Mar 18, 2024 Attribute Dense Captioning
— Unverified 00 FMBench: Benchmarking Fairness in Multimodal Large Language Models on Medical Tasks Oct 1, 2024 Benchmarking Fairness
— Unverified 00 Focused Evaluation for Image Description with Binary Forced-Choice Tasks Aug 1, 2016 Image Captioning Image Description
— Unverified 00 FOCUS: Internal MLLM Representations for Efficient Fine-Grained Visual Question Answering Jun 25, 2025 Question Answering Visual Question Answering
— Unverified 00 Focus on What Matters: Enhancing Medical Vision-Language Models with Automatic Attention Alignment Tuning May 24, 2025 Visual Question Answering (VQA)
— Unverified 00 Fooling Vision and Language Models Despite Localization and Attention Mechanism Sep 25, 2017 Dense Captioning Natural Language Understanding
— Unverified 00 Foundational Model for Electron Micrograph Analysis: Instruction-Tuning Small-Scale Language-and-Vision Assistant for Enterprise Adoption Aug 23, 2024 Instruction Following Knowledge Distillation
— Unverified 00 FOVQA: Blind Foveated Video Quality Assessment Jun 24, 2021 Video Compression Video Quality Assessment
— Unverified 00 Free Form Medical Visual Question Answering in Radiology Jan 23, 2024 Diagnostic Form
— Unverified 00 From Easy to Hard: Learning Language-guided Curriculum for Visual Question Answering on Remote Sensing Data May 6, 2022 Question Answering Visual Question Answering
— Unverified 00 From Images to Textual Prompts: Zero-Shot Visual Question Answering With Frozen Large Language Models Jan 1, 2023 Question Answering Visual Question Answering
— Unverified 00 From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities Nov 1, 2023 Navigate Question Answering
— Unverified 00 From Known to the Unknown: Transferring Knowledge to Answer Questions about Novel Visual and Semantic Concepts Nov 30, 2018 Novel Concepts Question Answering
— Unverified 00 From Pixels to Graphs: using Scene and Knowledge Graphs for HD-EPIC VQA Challenge Jun 10, 2025 Knowledge Graphs Language Modeling
— Unverified 00 From Pixels to Objects: Cubic Visual Attention for Visual Question Answering Jun 4, 2022 Object Question Answering
— Unverified 00