Attention Mechanism based Cognition-level Scene Understanding Apr 17, 2022 Question Answering Scene Understanding
— Unverified 00 Attention Overlap Is Responsible for The Entity Missing Problem in Text-to-image Diffusion Models! Oct 28, 2024 Denoising Question Answering
— Unverified 00 Attentive Explanations: Justifying Decisions and Pointing to the Evidence Dec 14, 2016 Decision Making Question Answering
— Unverified 00 Attentive Explanations: Justifying Decisions and Pointing to the Evidence (Extended Abstract) Nov 17, 2017 Question Answering Visual Question Answering (VQA)
— Unverified 00 Audio-Visual Quality Assessment for User Generated Content: Database and Method Mar 4, 2023 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 00 Augmenting Image Question Answering Dataset by Exploiting Image Captions May 1, 2018 Data Augmentation Image Captioning
— Unverified 00 A Unified Framework for Multilingual and Code-Mixed Visual Question Answering Dec 1, 2020 Question Answering Visual Question Answering
— Unverified 00 Auto-Parsing Network for Image Captioning and Visual Question Answering Aug 24, 2021 Image Captioning Question Answering
— Unverified 00 AVIS: Autonomous Visual Information Seeking with Large Language Model Agent Jun 13, 2023 Decision Making Language Modeling
— Unverified 00 A Vision Centric Remote Sensing Benchmark Mar 20, 2025 Question Answering Representation Learning
— Unverified 00 A Visual Question Answering Method for SAR Ship: Breaking the Requirement for Multimodal Dataset Construction and Model Fine-Tuning Nov 3, 2024 object-detection Object Detection
— Unverified 00 Avoiding Barren Plateaus with Classical Deep Neural Networks May 26, 2022 Visual Question Answering (VQA)
— Unverified 00 Backdooring Vision-Language Models with Out-Of-Distribution Data Oct 2, 2024 Image Captioning Image to text
— Unverified 00 BACON: Improving Clarity of Image Captions via Bag-of-Concept Graphs Jul 3, 2024 Image Captioning Image Generation
— Unverified 00 BARTPhoBEiT: Pre-trained Sequence-to-Sequence and Image Transformers Models for Vietnamese Visual Question Answering Jul 28, 2023 Question Answering Vietnamese Visual Question Answering
— Unverified 00 Bayesian Attention Belief Networks Jun 9, 2021 Decoder Machine Translation
— Unverified 00 Being Negative but Constructively: Lessons Learnt from Creating Better Visual Question Answering Datasets Apr 24, 2017 Multiple-choice Question Answering
— Unverified 00 @Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology Sep 21, 2024 Benchmarking Depth Estimation
— Unverified 00 Benchmarking Large Multimodal Models for Ophthalmic Visual Question Answering with OphthalWeChat May 26, 2025 Benchmarking Question Answering
— Unverified 00 Benchmarking Multimodal Models for Ukrainian Language Understanding Across Academic and Cultural Domains Nov 22, 2024 Benchmarking Caption Generation
— Unverified 00 What BERT Sees: Cross-Modal Transfer for Visual Question Generation Feb 25, 2020 Question Generation Question-Generation
— Unverified 00 BESTMVQA: A Benchmark Evaluation System for Medical Visual Question Answering Dec 13, 2023 Medical Visual Question Answering Question Answering
— Unverified 00 Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning Oct 8, 2024 Image Retrieval Math
— Unverified 00 Beyond Human Vision: The Role of Large Vision Language Models in Microscope Image Analysis May 1, 2024 Image Captioning Question Answering
— Unverified 00 Beyond the Hype: A dispassionate look at vision-language models in medical scenario Aug 16, 2024 Question Answering Spatial Reasoning
— Unverified 00 Beyond VQA: Generating Multi-word Answer and Rationale to Visual Questions Oct 24, 2020 General Classification Multiple-choice
— Unverified 00 Binding Touch to Everything: Learning Unified Multimodal Tactile Representations Jan 31, 2024 Question Answering Visual Question Answering (VQA)
— Unverified 00 BloomVQA: Assessing Hierarchical Multi-modal Comprehension Dec 20, 2023 Data Augmentation Memorization
— Unverified 00 BOK-VQA: Bilingual outside Knowledge-Based Visual Question Answering via Graph Representation Pretraining Jan 12, 2024 Question Answering Visual Question Answering
— Unverified 00 BRAVE: Broadening the visual encoding of vision-language models Apr 10, 2024 Hallucination Language Modelling
— Unverified 00 Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images Mar 13, 2023 Common Sense Reasoning Explanation Generation
— Unverified 00 Breaking Down Questions for Outside-Knowledge VQA Sep 29, 2021 Graph Neural Network Question Answering
— Unverified 00 Breaking Down Questions for Outside-Knowledge Visual Question Answering Nov 16, 2021 Graph Neural Network Question Answering
— Unverified 00 Bridge Damage Cause Estimation Using Multiple Images Based on Visual Question Answering Feb 18, 2023 Question Answering Visual Question Answering
— Unverified 00 Bridging the Gap between Recognition-level Pre-training and Commonsensical Vision-language Tasks May 1, 2022 Diversity Informativeness
— Unverified 00 Bridging the Gap: Exploring the Capabilities of Bridge-Architectures for Complex Visual Reasoning Tasks Jul 31, 2023 Image Retrieval Object
— Unverified 00 Bridging the Semantic Gaps: Improving Medical VQA Consistency with LLM-Augmented Question Sets Apr 16, 2025 Diversity Medical Visual Question Answering
— Unverified 00 Bridging Video Quality Scoring and Justification via Large Multimodal Models Jun 26, 2025 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 00 Bring Remote Sensing Object Detect Into Nature Language Model: Using SFT Method Mar 11, 2025 Language Modeling Language Modelling
— Unverified 00 BuDDIE: A Business Document Dataset for Multi-task Information Extraction Apr 5, 2024 Document Classification document understanding
— Unverified 00 Building Trustworthy Multimodal AI: A Review of Fairness, Transparency, and Ethics in Vision-Language Tasks Apr 14, 2025 Ethics Fairness
— Unverified 00 C3DVQA: Full-Reference Video Quality Assessment with 3D Convolutional Neural Network Oct 30, 2019 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 00 CalliReader: Contextualizing Chinese Calligraphy via an Embedding-Aligned Vision-Language Model Mar 9, 2025 Hallucination Language Modeling
— Unverified 00 Can Common VLMs Rival Medical VLMs? Evaluation and Strategic Insights Jun 19, 2025 Question Answering Visual Question Answering
— Unverified 00 Can Multimodal LLMs do Visual Temporal Understanding and Reasoning? The answer is No! Jan 18, 2025 Multiple-choice Question Answering
— Unverified 00 Can Open Domain Question Answering Systems Answer Visual Knowledge Questions? Feb 9, 2022 Open-Domain Question Answering Question Answering
— Unverified 00 Can Pre-training help VQA with Lexical Variations? Nov 1, 2020 Question Answering Visual Question Answering
— Unverified 00 Can SAR improve RSVQA performance? Aug 28, 2024 Question Answering Visual Question Answering
— Unverified 00 Can Visual Language Models Replace OCR-Based Visual Question Answering Pipelines in Production? A Case Study in Retail Aug 28, 2024 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 00 Can We Generate Visual Programs Without Prompting LLMs? Dec 11, 2024 Data Augmentation Question Answering
— Unverified 00