Towards Escaping from Language Bias and OCR Error: Semantics-Centered Text Visual Question Answering Mar 24, 2022 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture Jan 1, 2022 Question Answering Visual Question Answering
— Unverified 0Towards Human-Level Understanding of Complex Process Engineering Schematics: A Pedagogical, Introspective Multi-Agent Framework for Open-Domain Question Answering Aug 24, 2024 knowledge editing Open-Domain Question Answering
— Unverified 0Towards Models that Can See and Read Jan 18, 2023 Decoder Image Captioning
— Unverified 0Towards Reasoning-Aware Explainable VQA Nov 9, 2022 Decoder Explanation Generation
— Unverified 0Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering Nov 29, 2023 Common Sense Reasoning Question Answering
— Unverified 0Towards Transparent AI Systems: Interpreting Visual Question Answering Models Aug 31, 2016 Question Answering Visual Question Answering
— Unverified 0Towards Unsupervised Visual Reasoning: Do Off-The-Shelf Features Know How to Reason? Dec 20, 2022 Question Answering Representation Learning
— Unverified 0Towards Visual Dialog for Radiology Jul 1, 2020 Question Answering Visual Dialog
— Unverified 0Toward Unsupervised Realistic Visual Question Answering Mar 9, 2023 Question Answering Visual Question Answering
— Unverified 0Training Recurrent Answering Units with Joint Loss Minimization for VQA Jun 12, 2016 Question Answering Visual Question Answering
— Unverified 0Transfer Learning in Visual and Relational Reasoning Nov 27, 2019 Question Answering Relational Reasoning
— Unverified 0Transferring Visual Attributes from Natural Language to Verified Image Generation May 24, 2023 Image Generation Text to Image Generation
— Unverified 0Transformers in Vision: A Survey Jan 4, 2021 Action Recognition Activity Recognition
— Unverified 0Transform-Retrieve-Generate: Natural Language-Centric Outside-Knowledge Visual Question Answering Jan 1, 2022 Generative Question Answering Image to text
— Unverified 0Translation Deserves Better: Analyzing Translation Artifacts in Cross-lingual Visual Question Answering Jun 4, 2024 Data Augmentation Machine Translation
— Unverified 0Tree Memory Networks for Modelling Long-term Temporal Dependencies Mar 12, 2017 Machine Translation Part-Of-Speech Tagging
— Unverified 0Triplet-Aware Scene Graph Embeddings Sep 19, 2019 Data Augmentation Graph Embedding
— Unverified 0Tri-VQA: Triangular Reasoning Medical Visual Question Answering for Multi-Attribute Analysis Jun 21, 2024 Attribute Medical Visual Question Answering
— Unverified 0TrojVLM: Backdoor Attack Against Vision Language Models Sep 28, 2024 Backdoor Attack Image Captioning
— Unverified 0TRRNet: Tiered Relation Reasoning for Compositional Visual Question Answering Aug 1, 2020 Object Question Answering
— Unverified 0TruthLens:A Training-Free Paradigm for DeepFake Detection Mar 19, 2025 Binary Classification DeepFake Detection
— Unverified 0Trying Bilinear Pooling in Video-QA Dec 18, 2020 Question Answering Video Question Answering
— Unverified 0Two can play this Game: Visual Dialog with Discriminative Question Generation and Answering Mar 29, 2018 Image Captioning Question Answering
— Unverified 0TxT: Crossmodal End-to-End Learning with Transformers Sep 9, 2021 Multimodal Reasoning Question Answering
— Unverified 0UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training Apr 1, 2021 Image-text matching Image-text Retrieval
— Unverified 0U-CAM: Visual Explanation using Uncertainty based Class Activation Maps Aug 17, 2019 Deep Learning Probabilistic Deep Learning
— Unverified 0SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge May 23, 2024 Question Answering RAG
— Unverified 0UFO: A UniFied TransfOrmer for Vision-Language Representation Learning Nov 19, 2021 Image Captioning Image-text matching
— Unverified 0UIT-Saviors at MEDVQA-GI 2023: Improving Multimodal Learning with Image Enhancement for Gastrointestinal Visual Question Answering Jul 6, 2023 Diagnostic Image Enhancement
— Unverified 0Unanswerable Questions about Images and Texts Jan 25, 2021 Question Answering Visual Question Answering
— Unverified 0Uncertainty based Class Activation Maps for Visual Question Answering Jan 23, 2020 Deep Learning Probabilistic Deep Learning
— Unverified 0Uncertainty-based Visual Question Answering: Estimating Semantic Inconsistency between Image and Knowledge Base Nov 16, 2021 Question Answering Semantic Similarity
— Unverified 0Uncertainty-based Visual Question Answering: Estimating Semantic Inconsistency between Image and Knowledge Base Jul 27, 2022 Question Answering Semantic Similarity
— Unverified 0Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning Mar 10, 2023 Few-Shot Image Classification image-classification
— Unverified 0Understanding and Mitigating Classification Errors Through Interpretable Token Patterns Nov 18, 2023 Classification NER
— Unverified 0Understanding Attention for Vision-and-Language Tasks Dec 17, 2021 Image Generation Image Retrieval
— Unverified 0Understanding in Artificial Intelligence Jan 17, 2021 Natural Language Understanding Question Answering
— Unverified 0Understanding Information Storage and Transfer in Multi-modal Large Language Models Jun 6, 2024 Factual Visual Question Answering Model Editing
— Unverified 0Understanding Knowledge Gaps in Visual Question Answering: Implications for Gap Identification and Testing Apr 8, 2020 Diversity Question Answering
— Unverified 0Understanding the Role of Scene Graphs in Visual Question Answering Jan 14, 2021 Graph Generation Question Answering
— Unverified 0UnICLAM:Contrastive Representation Learning with Adversarial Masking for Unified and Interpretable Medical Vision Question Answering Dec 21, 2022 Data Augmentation Decision Making
— Unverified 0UniCode: Learning a Unified Codebook for Multimodal Large Language Models Mar 14, 2024 Quantization Visual Question Answering (VQA)
— Unverified 0Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training Jan 11, 2022 Decoder Image Captioning
— Unverified 0Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action Jan 1, 2024 Image Generation Instruction Following
— Unverified 0Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks Jun 17, 2022 Depth Estimation Image Generation
— Unverified 0Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation Dec 10, 2021 Image-text matching Image-text Retrieval
— Unverified 0Unified Scene Representation and Reconstruction for 3D Large Language Models Apr 19, 2024 3D Reconstruction Scene Understanding
— Unverified 0Uni-Mlip: Unified Self-supervision for Medical Vision Language Pre-training Nov 20, 2024 Contrastive Learning image-classification
— Unverified 0UniRVQA: A Unified Framework for Retrieval-Augmented Vision Question Answering via Self-Reflective Joint Training Apr 5, 2025 Articles Question Answering
— Unverified 0