FVQA 2.0: Introducing Adversarial Samples into Fact-based Visual Question Answering Mar 19, 2023 Common Sense Reasoning Information Retrieval
— Unverified 0Logical Implications for Visual Question Answering Consistency Mar 16, 2023 Language Modeling Language Modelling
Code Code Available 0Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images Mar 13, 2023 Common Sense Reasoning Explanation Generation
— Unverified 0MRET: Multi-resolution Transformer for Video Quality Assessment Mar 13, 2023 Video Quality Assessment Video Recognition
— Unverified 0Polar-VQA: Visual Question Answering on Remote Sensed Ice sheet Imagery from Polar Region Mar 13, 2023 Question Answering Visual Question Answering
— Unverified 0Vision-Language Models as Success Detectors Mar 13, 2023 Question Answering Visual Question Answering
— Unverified 0MuLTI: Efficient Video-and-Language Understanding with Text-Guided MultiWay-Sampler and Multiple Choice Modeling Mar 10, 2023 Multi-Label Classification MUlTI-LABEL-ClASSIFICATION
— Unverified 0Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning Mar 10, 2023 Few-Shot Image Classification image-classification
— Unverified 0Toward Unsupervised Realistic Visual Question Answering Mar 9, 2023 Question Answering Visual Question Answering
— Unverified 0Interpretable Visual Question Answering Referring to Outside Knowledge Mar 8, 2023 Diversity Image Captioning
— Unverified 0Graph Neural Networks in Vision-Language Image Understanding: A Survey Mar 7, 2023 Image Captioning Image Retrieval
— Unverified 0Knowledge-Based Counterfactual Queries for Visual Question Answering Mar 5, 2023 counterfactual Decision Making
— Unverified 0VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning Mar 5, 2023 Answer Generation Entity Alignment
Code Code Available 0Audio-Visual Quality Assessment for User Generated Content: Database and Method Mar 4, 2023 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 0VQA with Cascade of Self- and Co-Attention Blocks Feb 28, 2023 Question Answering Visual Question Answering
— Unverified 0Medical visual question answering using joint self-supervised learning Feb 25, 2023 Decoder Diversity
— Unverified 0EVJVQA Challenge: Multilingual Visual Question Answering Feb 23, 2023 Language Modeling Language Modelling
— Unverified 0VinVL+L: Enriching Visual Representation with Location Context in VQA Feb 22, 2023 Question Answering TAG
Code Code Available 0Few-shot Multimodal Multitask Multilingual Learning Feb 19, 2023 Few-Shot Learning In-Context Learning
— Unverified 0Interpretable Medical Image Visual Question Answering via Multi-Modal Relationship Graph Learning Feb 19, 2023 Graph Learning Medical Visual Question Answering
— Unverified 0Bridge Damage Cause Estimation Using Multiple Images Based on Visual Question Answering Feb 18, 2023 Question Answering Visual Question Answering
— Unverified 0Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis Feb 11, 2023 Image-text Retrieval Knowledge Graphs
Code Code Available 0Is Multimodal Vision Supervision Beneficial to Language? Feb 10, 2023 Image Retrieval Natural Language Understanding
Code Code Available 0BinaryVQA: A Versatile Test Set to Evaluate the Out-of-Distribution Generalization of VQA Models Jan 28, 2023 Out-of-Distribution Generalization Question Answering
Code Code Available 0Towards a Unified Model for Generating Answers and Explanations in Visual Question Answering Jan 25, 2023 Decoder Explanation Generation
— Unverified 0HRVQA: A Visual Question Answering Benchmark for High-Resolution Aerial Images Jan 23, 2023 Attribute Question Answering
— Unverified 0Towards Models that Can See and Read Jan 18, 2023 Decoder Image Captioning
— Unverified 0Curriculum Script Distillation for Multilingual Visual Question Answering Jan 17, 2023 Question Answering Visual Question Answering
— Unverified 0Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks Jan 12, 2023 Cross-Modal Retrieval Open-Ended Question Answering
Code Code Available 0Adaptively Clustering Neighbor Elements for Image-Text Generation Jan 5, 2023 Clustering Decoder
Code Code Available 0PromptCap: Prompt-Guided Image Captioning for VQA with GPT-3 Jan 1, 2023 Image Captioning Question Answering
— Unverified 0Toward Multi-Granularity Decision-Making: Explicit Visual Reasoning with Hierarchical Knowledge Jan 1, 2023 Decision Making Question Answering
Code Code Available 0Dynamic Inference With Grounding Based Vision and Language Models Jan 1, 2023 Language Modelling Referring Expression
— Unverified 0RMLVQA: A Margin Loss Approach for Visual Question Answering With Language Biases Jan 1, 2023 Question Answering Visual Question Answering
— Unverified 0From Images to Textual Prompts: Zero-Shot Visual Question Answering With Frozen Large Language Models Jan 1, 2023 Question Answering Visual Question Answering
— Unverified 0Decouple Before Interact: Multi-Modal Prompt Learning for Continual Visual Question Answering Jan 1, 2023 Continual Learning Language Modelling
— Unverified 0HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training Dec 30, 2022 cross-modal alignment TGIF-Action
— Unverified 0VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and Challenges Dec 26, 2022 Representation Learning Visual Question Answering (VQA)
— Unverified 0When are Lemons Purple? The Concept Association Bias of Vision-Language Models Dec 22, 2022 Attribute image-classification
— Unverified 0From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models Dec 21, 2022 Question Answering Visual Question Answering
Code Code Available 0UnICLAM:Contrastive Representation Learning with Adversarial Masking for Unified and Interpretable Medical Vision Question Answering Dec 21, 2022 Data Augmentation Decision Making
— Unverified 0DePlot: One-shot visual language reasoning by plot-to-table translation Dec 20, 2022 Chart Question Answering Factual Inconsistency Detection in Chart Captioning
— Unverified 0Towards Unsupervised Visual Reasoning: Do Off-The-Shelf Features Know How to Reason? Dec 20, 2022 Question Answering Representation Learning
— Unverified 0MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering Dec 19, 2022 Chart Question Answering Data Summarization
— Unverified 0SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering Dec 16, 2022 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0CLIPPO: Image-and-Language Understanding from Pixels Only Dec 15, 2022 Contrastive Learning image-classification
— Unverified 0REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory Dec 10, 2022 Image Captioning Language Modeling
Code Code Available 0VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners Dec 9, 2022 Question Answering Retrieval
— Unverified 0Review of Ansatz Designing Techniques for Variational Quantum Algorithms Dec 7, 2022 Visual Question Answering (VQA)
— Unverified 0ParsVQA-Caps: A Benchmark for Visual Question Answering and Image Captioning in Persian Dec 7, 2022 Image Captioning Question Answering
— Unverified 0