UNITER: Learning UNiversal Image-TExt Representations Sep 25, 2019 Image-text matching Image-text Retrieval
— Unverified 0Un jeu de données pour répondre à des questions visuelles à propos d’entités nommées en utilisant des bases de connaissances (ViQuAE, a Dataset for Knowledge-based Visual Question Answering about Named Entities) Jun 1, 2022 Question Answering Visual Question Answering
— Unverified 0Unleashing the Potential of Large Language Model: Zero-shot VQA for Flood Disaster Scenario Dec 4, 2023 Language Modeling Language Modelling
— Unverified 0Unshuffling Data for Improved Generalization Feb 27, 2020 Clustering Data Augmentation
— Unverified 0Unshuffling Data for Improved Generalization in Visual Question Answering Jan 1, 2021 Out-of-Distribution Generalization Question Answering
— Unverified 0Unsupervised Keyword Extraction for Full-sentence VQA Nov 23, 2019 Keyword Extraction Question Answering
— Unverified 0Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment Mar 1, 2022 Retrieval Sentence
— Unverified 0Unveiling Cross Modality Bias in Visual Question Answering: A Causal View with Possible Worlds VQA May 31, 2023 counterfactual Counterfactual Inference
— Unverified 0UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation Mar 19, 2025 Language Model Evaluation Language Modeling
— Unverified 0Using Visual Cropping to Enhance Fine-Detail Question Answering of BLIP-Family Models May 31, 2023 Question Answering Visual Question Answering
— Unverified 0V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts Mar 3, 2025 Contrastive Learning Text Retrieval
— Unverified 0V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts Jan 1, 2025 Contrastive Learning Text Retrieval
— Unverified 0VALSE: A Task-Independent Benchmark for Vision and Language Models centered on Linguistic Phenomena Aug 17, 2021 Question Answering Visual Question Answering
— Unverified 0Variational Disentangled Attention for Regularized Visual Dialog Sep 29, 2021 Question Answering Visual Dialog
— Unverified 0Variational Visual Question Answering May 14, 2025 Question Answering Visual Question Answering
— Unverified 0V-Doc : Visual questions answers with Documents May 27, 2022 Question Answering Question Generation
— Unverified 0V-Doc: Visual Questions Answers With Documents Jan 1, 2022 Question Answering Question Generation
— Unverified 0Verb Mirage: Unveiling and Assessing Verb Concept Hallucinations in Multimodal Large Language Models Dec 6, 2024 Hallucination Optical Character Recognition (OCR)
— Unverified 0VGNMN: Video-grounded Neural Module Networks for Video-Grounded Dialogue Systems Jul 1, 2022 Information Retrieval Question Answering
— Unverified 0VGNMN: Video-grounded Neural Module Network to Video-Grounded Language Tasks Apr 16, 2021 Information Retrieval Question Answering
— Unverified 0Video Instruction Tuning With Synthetic Data Oct 3, 2024 3D Question Answering (3D-QA)
— Unverified 0Video Quality Assessment Based on Swin TransformerV2 and Coarse to Fine Strategy Jan 16, 2024 Image Quality Assessment Video Quality Assessment
— Unverified 0Video Quality Assessment for Online Processing: From Spatial to Temporal Sampling Jan 13, 2025 Video Quality Assessment Video Understanding
— Unverified 0Video Question Answering via Attribute-Augmented Attention Network Learning Jul 20, 2017 Attribute Information Retrieval
— Unverified 0Video Question Answering with Iterative Video-Text Co-Tokenization Aug 1, 2022 Question Answering Video Question Answering
— Unverified 0Video-Text Dataset Construction from Multi-AI Feedback: Promoting Weak-to-Strong Preference Learning for Video Large Language Models Nov 25, 2024 Visual Question Answering (VQA)
— Unverified 0VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners Dec 9, 2022 Question Answering Retrieval
— Unverified 0ViLMedic: a framework for research at the intersection of vision and language in medical AI May 1, 2022 Medical Visual Question Answering Question Answering
— Unverified 0Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese Aug 22, 2024 Language Modeling Language Modelling
— Unverified 0VisCon-100K: Leveraging Contextual Web Data for Fine-tuning Vision Language Models Feb 14, 2025 Image Captioning Large Language Model
— Unverified 0Vision-Amplified Semantic Entropy for Hallucination Detection in Medical Visual Question Answering Mar 26, 2025 Diagnostic Hallucination
— Unverified 0Vision and Language: from Visual Perception to Content Creation Dec 26, 2019 Decoder Question Answering
— Unverified 0Vision and Language Integration: Moving beyond Objects Jan 1, 2017 Action Classification Image Captioning
— Unverified 0Vision-Language Models as Success Detectors Mar 13, 2023 Question Answering Visual Question Answering
— Unverified 0Vision-Language Pretraining: Current Trends and the Future May 1, 2022 Question Answering Representation Learning
— Unverified 0Vision LLMs Are Bad at Hierarchical Visual Understanding, and LLMs Are the Bottleneck May 30, 2025 Question Answering Visual Question Answering
— Unverified 0Vision-to-Language Tasks Based on Attributes and Attention Mechanism May 29, 2019 Image Captioning Question Answering
— Unverified 0Visual7W: Grounded Question Answering in Images Nov 11, 2015 Multiple-choice Multiple Choice Question Answering (MCQA)
— Unverified 0Visual Commonsense based Heterogeneous Graph Contrastive Learning Nov 11, 2023 Contrastive Learning Question Answering
— Unverified 0Visual Entailment: A Novel Task for Fine-Grained Image Understanding Jan 20, 2019 Natural Language Inference Question Answering
— Unverified 0Visual Entailment Task for Visually-Grounded Language Learning Nov 26, 2018 Grounded language learning Natural Language Inference
— Unverified 0Visual Explanations from Hadamard Product in Multimodal Deep Networks Dec 18, 2017 Question Answering Visual Question Answering
— Unverified 0Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation Apr 30, 2024 Caption Generation Hallucination
— Unverified 0Visual Graph Question Answering with ASP and LLMs for Language Parsing Feb 13, 2025 Graph Question Answering Optical Character Recognition
— Unverified 0Visual Grounding Strategies for Text-Only Natural Language Processing Mar 25, 2021 Image Retrieval Language Modeling
— Unverified 0Visual Hallucination: Definition, Quantification, and Prescriptive Remediations Mar 26, 2024 Hallucination Image Captioning
— Unverified 0Visually Guided Spatial Relation Extraction from Text Jun 1, 2018 Activity Recognition Image Captioning
— Unverified 0Visual Mechanisms Inspired Efficient Transformers for Image and Video Quality Assessment Mar 28, 2022 Image Quality Assessment Video Quality Assessment
— Unverified 0Visual Perturbation-aware Collaborative Learning for Overcoming the Language Prior Problem Jul 24, 2022 Diagnostic Question Answering
— Unverified 0Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models Dec 5, 2023 Language Modeling Language Modelling
— Unverified 0