Defeasible Visual Entailment: Benchmark, Evaluator, and Reward-Driven Optimization Dec 19, 2024 Contrastive Learning Decision Making
Code Code Available 1VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks Jul 29, 2024 Deep Learning Domain Generalization
— Unverified 0Advancing Grounded Multimodal Named Entity Recognition via LLM-Based Reformulation and Box-Based Segmentation Jun 11, 2024 Grounded Multimodal Named Entity Recognition named-entity-recognition
Code Code Available 1Understanding Figurative Meaning through Explainable Visual Entailment May 2, 2024 Question Answering Visual Entailment
Code Code Available 1MoPE: Mixture of Prompt Experts for Parameter-Efficient and Scalable Multimodal Fusion Mar 14, 2024 Disentanglement Multimodal Deep Learning
Code Code Available 1VEglue: Testing Visual Entailment Systems via Object-Aligned Joint Erasing Mar 5, 2024 Multimodal Reasoning Sentence
Code Code Available 0ArcSin: Adaptive ranged cosine Similarity injected noise for Language-Driven Visual Tasks Feb 27, 2024 Domain Generalization Image Captioning
— Unverified 0LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition Feb 15, 2024 Grounded Multimodal Named Entity Recognition Multi-modal Named Entity Recognition
Code Code Available 1p-Laplacian Adaptation for Generative Pre-trained Vision-Language Models Dec 17, 2023 Image Captioning Question Answering
Code Code Available 0Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning Dec 15, 2023 Factual Inconsistency Detection in Chart Captioning Image Captioning
Code Code Available 1Good Questions Help Zero-Shot Image Reasoning Dec 4, 2023 Fine-Grained Image Classification Question Answering
Code Code Available 1Lightweight In-Context Tuning for Multimodal Unified Models Oct 8, 2023 Image Captioning In-Context Learning
— Unverified 0Stop Pre-Training: Adapt Visual-Language Models to Unseen Languages Jun 29, 2023 Image-text Retrieval Machine Translation
Code Code Available 0"Let's not Quote out of Context": Unified Vision-Language Pretraining for Context Assisted Image Captioning Jun 1, 2023 Image Captioning Keyword Extraction
— Unverified 0I Spy a Metaphor: Large Language Models and Diffusion Models Co-Create Visual Metaphors May 24, 2023 Visual Entailment
Code Code Available 1Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning Mar 10, 2023 Few-Shot Image Classification image-classification
— Unverified 0Few-shot Multimodal Multitask Multilingual Learning Feb 19, 2023 Few-Shot Learning In-Context Learning
— Unverified 0Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift Dec 15, 2022 Benchmarking Image Captioning
Code Code Available 1Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations Dec 8, 2022 Explanation Generation Visual Entailment
Code Code Available 1Compound Tokens: Channel Fusion for Vision-Language Representation Learning Dec 2, 2022 Decoder Language Modeling
— Unverified 0A survey on knowledge-enhanced multimodal learning Nov 19, 2022 Conditional Image Generation Factual Visual Question Answering
— Unverified 0I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision Nov 17, 2022 Image Captioning Question Answering
Code Code Available 1AlignVE: Visual Entailment Recognition Based on Alignment Relations Nov 16, 2022 Question Answering Relation
— Unverified 0MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model Oct 11, 2022 Contrastive Learning Image-text matching
Code Code Available 1Pre-training image-language transformers for open-vocabulary tasks Sep 9, 2022 Question Answering Visual Entailment
— Unverified 0Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment Aug 29, 2022 cross-modal alignment Image-text Retrieval
Code Code Available 1Prompt Tuning for Generative Multimodal Pretrained Models Aug 4, 2022 Image Captioning Visual Entailment
Code Code Available 0Chunk-aware Alignment and Lexical Constraint for Visual Entailment with Natural Language Explanations Jul 23, 2022 Decision Making Explanation Generation
Code Code Available 0MixGen: A New Multi-Modal Data Augmentation Jun 16, 2022 Data Augmentation Image-text Retrieval
Code Code Available 1CoCa: Contrastive Captioners are Image-Text Foundation Models May 4, 2022 Action Classification Decoder
Code Code Available 1Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering May 2, 2022 Decoder Image Captioning
— Unverified 0Visual Spatial Reasoning Apr 30, 2022 Spatial Reasoning
Code Code Available 1Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks Apr 22, 2022 Question Answering Visual Commonsense Reasoning
— Unverified 0Fine-Grained Visual Entailment Mar 29, 2022 Multimodal Reasoning Visual Entailment
Code Code Available 1CLIP Models are Few-shot Learners: Empirical Studies on VQA and Visual Entailment Mar 14, 2022 parameter-efficient fine-tuning Question Answering
— Unverified 0NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks Mar 9, 2022 Decision Making Explainable artificial intelligence
Code Code Available 1Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment Mar 1, 2022 Retrieval Sentence
— Unverified 0OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework Feb 7, 2022 Image Captioning image-classification
Code Code Available 0CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks Jan 15, 2022 Question Answering Visual Commonsense Reasoning
— Unverified 0Logically at Factify 2022: Multimodal Fact Verification Dec 16, 2021 Benchmarking Fact Checking
— Unverified 0Distilled Dual-Encoder Model for Vision-Language Understanding Dec 16, 2021 Image to text model
Code Code Available 1Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation Dec 10, 2021 Image-text matching Image-text Retrieval
— Unverified 0How Much Can CLIP Benefit Vision-and-Language Tasks? Sep 29, 2021 Question Answering Visual Entailment
— Unverified 0Check It Again:Progressive Visual Question Answering via Visual Entailment Aug 1, 2021 Question Answering Visual Entailment
Code Code Available 1How Much Can CLIP Benefit Vision-and-Language Tasks? Jul 13, 2021 Question Answering Vision and Language Navigation
Code Code Available 1Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training Jun 25, 2021 Image-text Retrieval Question Answering
— Unverified 0Check It Again: Progressive Visual Question Answering via Visual Entailment Jun 8, 2021 Question Answering Visual Entailment
Code Code Available 1Probing Inter-modality: Visual Parsing with Self-Attention for Vision-and-Language Pre-training May 21, 2021 Question Answering Relation
— Unverified 0Playing Lottery Tickets with Vision and Language Apr 23, 2021 Image-text Retrieval Question Answering
— Unverified 0Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning Apr 7, 2021 Representation Learning Retrieval
Code Code Available 1