Integrating Image Features with Convolutional Sequence-to-sequence Network for Multilingual Visual Question Answering Mar 22, 2023 Question Answering Visual Question Answering
Code Code Available 0TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering Mar 21, 2023 4k Image Generation
Code Code Available 1eP-ALM: Efficient Perceptual Augmentation of Language Models Mar 20, 2023 In-Context Learning Visual Question Answering (VQA)
Code Code Available 13D Concept Learning and Reasoning from Multi-View Images Mar 20, 2023 Question Answering Visual Question Answering
— Unverified 0FVQA 2.0: Introducing Adversarial Samples into Fact-based Visual Question Answering Mar 19, 2023 Common Sense Reasoning Information Retrieval
— Unverified 0Logical Implications for Visual Question Answering Consistency Mar 16, 2023 Language Modeling Language Modelling
Code Code Available 0VDPVE: VQA Dataset for Perceptual Video Enhancement Mar 16, 2023 Deblurring valid
Code Code Available 1GPT-4 Technical Report Mar 15, 2023 answerability prediction Arithmetic Reasoning
Code Code Available 6MRET: Multi-resolution Transformer for Video Quality Assessment Mar 13, 2023 Video Quality Assessment Video Recognition
— Unverified 0Polar-VQA: Visual Question Answering on Remote Sensed Ice sheet Imagery from Polar Region Mar 13, 2023 Question Answering Visual Question Answering
— Unverified 0Vision-Language Models as Success Detectors Mar 13, 2023 Question Answering Visual Question Answering
— Unverified 0PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents Mar 13, 2023 image-classification Image Classification
Code Code Available 2Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images Mar 13, 2023 Common Sense Reasoning Explanation Generation
— Unverified 0MuLTI: Efficient Video-and-Language Understanding with Text-Guided MultiWay-Sampler and Multiple Choice Modeling Mar 10, 2023 Multi-Label Classification MUlTI-LABEL-ClASSIFICATION
— Unverified 0Open-Ended Medical Visual Question Answering Through Prefix Tuning of Language Models Mar 10, 2023 Language Modeling Language Modelling
Code Code Available 1Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning Mar 10, 2023 Few-Shot Image Classification image-classification
— Unverified 0Toward Unsupervised Realistic Visual Question Answering Mar 9, 2023 Question Answering Visual Question Answering
— Unverified 0Interpretable Visual Question Answering Referring to Outside Knowledge Mar 8, 2023 Diversity Image Captioning
— Unverified 0Graph Neural Networks in Vision-Language Image Understanding: A Survey Mar 7, 2023 Image Captioning Image Retrieval
— Unverified 0PaLM-E: An Embodied Multimodal Language Model Mar 6, 2023 Language Modeling Language Modelling
Code Code Available 2Knowledge-Based Counterfactual Queries for Visual Question Answering Mar 5, 2023 counterfactual Decision Making
— Unverified 0VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning Mar 5, 2023 Answer Generation Entity Alignment
Code Code Available 0Audio-Visual Quality Assessment for User Generated Content: Database and Method Mar 4, 2023 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 0Prismer: A Vision-Language Model with Multi-Task Experts Mar 4, 2023 Few-Shot Learning Image Captioning
Code Code Available 1Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering Mar 3, 2023 Language Modelling Large Language Model
Code Code Available 2ConTEXTual Net: A Multimodal Vision-Language Model for Segmentation of Pneumothorax Mar 2, 2023 Descriptive Image Captioning
Code Code Available 1BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs Mar 2, 2023 Articles Medical Visual Question Answering
Code Code Available 1MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering Mar 2, 2023 Mixture-of-Experts Question Answering
Code Code Available 1RAMM: Retrieval-augmented Biomedical Visual Question Answering with Multi-modal Pre-training Mar 1, 2023 Question Answering Retrieval
Code Code Available 1VQA with Cascade of Self- and Co-Attention Blocks Feb 28, 2023 Question Answering Visual Question Answering
— Unverified 0Exploring Opinion-unaware Video Quality Assessment with Semantic Affinity Criterion Feb 26, 2023 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 1Medical visual question answering using joint self-supervised learning Feb 25, 2023 Decoder Diversity
— Unverified 0EVJVQA Challenge: Multilingual Visual Question Answering Feb 23, 2023 Language Modeling Language Modelling
— Unverified 0Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions? Feb 23, 2023 Open-Domain Question Answering Question Answering
Code Code Available 1VinVL+L: Enriching Visual Representation with Location Context in VQA Feb 22, 2023 Question Answering TAG
Code Code Available 0Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities Feb 22, 2023 Entity Linking Fine-Grained Image Recognition
Code Code Available 1Few-shot Multimodal Multitask Multilingual Learning Feb 19, 2023 Few-Shot Learning In-Context Learning
— Unverified 0Interpretable Medical Image Visual Question Answering via Multi-Modal Relationship Graph Learning Feb 19, 2023 Graph Learning Medical Visual Question Answering
— Unverified 0Bridge Damage Cause Estimation Using Multiple Images Based on Visual Question Answering Feb 18, 2023 Question Answering Visual Question Answering
— Unverified 0Multimodal Federated Learning via Contrastive Representation Ensemble Feb 17, 2023 Federated Learning Image-text Retrieval
Code Code Available 1Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts Feb 17, 2023 Image Retrieval Image-text Classification
Code Code Available 1UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling Feb 13, 2023 Image-text Retrieval Retrieval
Code Code Available 1Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis Feb 11, 2023 Image-text Retrieval Knowledge Graphs
Code Code Available 0Is Multimodal Vision Supervision Beneficial to Language? Feb 10, 2023 Image Retrieval Natural Language Understanding
Code Code Available 0Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications Feb 1, 2023 Question Answering Representation Learning
Code Code Available 1mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video Feb 1, 2023 Action Classification Image Classification
Code Code Available 4UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers Jan 31, 2023 Image Captioning Image Classification
Code Code Available 1BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models Jan 30, 2023 Generative Visual Question Answering Image Captioning
Code Code Available 4BinaryVQA: A Versatile Test Set to Evaluate the Out-of-Distribution Generalization of VQA Models Jan 28, 2023 Out-of-Distribution Generalization Question Answering
Code Code Available 0Towards a Unified Model for Generating Answers and Explanations in Visual Question Answering Jan 25, 2023 Decoder Explanation Generation
— Unverified 0