Grounded Word Sense Translation Jun 1, 2019 Grounded language learning Machine Translation
— Unverified 0Grounding Answers for Visual Questions Asked by Visually Impaired People Jun 20, 2022 Question Answering Visual Question Answering
— Unverified 0Grounding Chest X-Ray Visual Question Answering with Generated Radiology Reports May 22, 2025 Answer Generation Question Answering
— Unverified 0Grounding Complex Navigational Instructions Using Scene Graphs Jun 3, 2021 Question Answering reinforcement-learning
— Unverified 0Guiding Medical Vision-Language Models with Explicit Visual Prompts: Framework Design and Comprehensive Exploration of Prompt Variations Jan 4, 2025 Decoder Visual Question Answering (VQA)
— Unverified 0Guiding Visual Question Answering with Attention Priors May 25, 2022 Question Answering Visual Grounding
— Unverified 0Guiding Visual Question Generation Oct 15, 2021 Question Generation Question-Generation
— Unverified 0HAMMR: HierArchical MultiModal React agents for generic VQA Apr 8, 2024 Optical Character Recognition (OCR) Question Answering
— Unverified 0Hardware-Friendly Static Quantization Method for Video Diffusion Transformers Feb 20, 2025 Quantization Video Generation
— Unverified 0HAUR: Human Annotation Understanding and Recognition Through Text-Heavy Images Dec 24, 2024 Optical Character Recognition (OCR) Question Answering
— Unverified 0HD-EPIC: A Highly-Detailed Egocentric Video Dataset Feb 6, 2025 Action Recognition Nutrition
— Unverified 0HDR-ChipQA: No-Reference Quality Assessment on High Dynamic Range Videos Apr 25, 2023 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 0Hierarchical Graph Attention Network for Few-Shot Visual-Semantic Learning Jan 1, 2021 Graph Attention Image Captioning
— Unverified 0Hierarchical Memory for Long Video QA Jun 30, 2024 GPU Question Answering
— Unverified 0Hierarchical Modeling for Medical Visual Question Answering with Cross-Attention Fusion Apr 4, 2025 Diagnostic Medical Visual Question Answering
— Unverified 0High Frame Rate Video Quality Assessment using VMAF and Entropic Differences Sep 27, 2021 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 0Highly Efficient No-reference 4K Video Quality Assessment with Full-Pixel Covering Sampling and Training Strategy Jul 30, 2024 4k Video Quality Assessment
— Unverified 0Hints of Prompt: Enhancing Visual Representation for Multimodal LLMs in Autonomous Driving Nov 20, 2024 Autonomous Driving Multimodal Reasoning
— Unverified 0HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training Dec 30, 2022 cross-modal alignment TGIF-Action
— Unverified 0How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering? Jun 19, 2025 Multiple-choice Question Answering
— Unverified 0How good are deep models in understanding the generated images? Aug 23, 2022 Object Object Recognition
— Unverified 0How Much Can CLIP Benefit Vision-and-Language Tasks? Sep 29, 2021 Question Answering Visual Entailment
— Unverified 0How (not) to ensemble LVLMs for VQA Oct 10, 2023 Retrieval Visual Question Answering (VQA)
— Unverified 0How to Design Sample and Computationally Efficient VQA Models Mar 22, 2021 Question Answering Visual Question Answering
— Unverified 0How to find a good image-text embedding for remote sensing visual question answering? Sep 24, 2021 Question Answering Visual Question Answering
— Unverified 0How Transferable are Reasoning Patterns in VQA? Apr 8, 2021 Question Answering Visual Question Answering
— Unverified 0How Well Can Vison-Language Models Understand Humans' Intention? An Open-ended Theory of Mind Question Evaluation Benchmark Mar 28, 2025 Question Answering Visual Question Answering
— Unverified 0HRVQA: A Visual Question Answering Benchmark for High-Resolution Aerial Images Jan 23, 2023 Attribute Question Answering
— Unverified 0Human-Adversarial Visual Question Answering Jun 4, 2021 Question Answering Visual Question Answering
— Unverified 0Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions? Jun 17, 2016 Question Answering Visual Question Answering
— Unverified 0Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions? Jun 11, 2016 Question Answering Visual Question Answering
— Unverified 0Hummingbird: High Fidelity Image Generation via Multimodal Context Alignment Feb 7, 2025 Diversity Human-Object Interaction Detection
— Unverified 0HVS Revisited: A Comprehensive Video Quality Assessment Framework Oct 9, 2022 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 0Hyperbolic Attention Networks May 24, 2018 Machine Translation Question Answering
— Unverified 0Hyper-dimensional computing for a visual question-answering system that is trainable end-to-end Nov 28, 2017 Question Answering Visual Question Answering
— Unverified 0Hypo3D: Exploring Hypothetical Reasoning in 3D Feb 2, 2025 Question Answering Visual Question Answering
— Unverified 0ICDAR 2019 Competition on Scene Text Visual Question Answering Jun 30, 2019 Question Answering Visual Question Answering
— Unverified 0ICDAR 2021 Competition on Document VisualQuestion Answering Nov 10, 2021 Visual Question Answering (VQA)
— Unverified 0CLIPPO: Image-and-Language Understanding from Pixels Only Dec 15, 2022 Contrastive Learning image-classification
— Unverified 0Image Captioning and Visual Question Answering Based on Attributes and External Knowledge Mar 9, 2016 General Knowledge Image Captioning
— Unverified 0Image Captioning with Compositional Neural Module Networks Jul 10, 2020 Image Captioning Question Answering
— Unverified 0Image Manipulation via Multi-Hop Instructions -- A New Dataset and Weakly-Supervised Neuro-Symbolic Approach May 23, 2023 Image Manipulation Question Answering
— Unverified 0Image Position Prediction in Multimodal Documents May 1, 2020 Articles Caption Generation
— Unverified 0Image Semantic Relation Generation Oct 19, 2022 Image Retrieval Image Segmentation
— Unverified 0ImageTTR: Grounding Type Theory with Records in Image Classification for Visual Question Answering Jun 1, 2019 General Classification image-classification
— Unverified 0Improved Bilinear Pooling with CNNs Jul 21, 2017 GPU Question Answering
— Unverified 0Improved Few-Shot Image Classification Through Multiple-Choice Questions Jul 23, 2024 Articles Few-Shot Image Classification
— Unverified 0Improving and Diagnosing Knowledge-Based Visual Question Answering via Entity Enhanced Knowledge Injection Dec 13, 2021 Common Sense Reasoning Knowledge Graph Embeddings
— Unverified 0Improving Automatic VQA Evaluation Using Large Language Models Oct 4, 2023 In-Context Learning Question Answering
— Unverified 0Improving Cross-Modal Understanding in Visual Dialog via Contrastive Learning Apr 15, 2022 Contrastive Learning Question Answering
— Unverified 0