Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards Jun 7, 2023 Diversity Image Captioning
Code Code Available 1Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images! Jun 6, 2023 counterfactual Data Augmentation
Code Code Available 1DocFormerv2: Local Features for Document Understanding Jun 2, 2023 Decoder document understanding
Code Code Available 1Revisiting the Role of Language Priors in Vision-Language Models Jun 2, 2023 Image-text matching Image-text Retrieval
Code Code Available 1End-to-end Knowledge Retrieval with Multi-modal Queries Jun 1, 2023 Benchmarking Cross-Modal Retrieval
Code Code Available 1Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering Jun 1, 2023 Optical Character Recognition (OCR) Question Answering
Code Code Available 1PaLI-X: On Scaling up a Multilingual Vision and Language Model May 29, 2023 Chart Question Answering document understanding
Code Code Available 1CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers May 27, 2023 Image Captioning Image Retrieval
Code Code Available 1Towards Explainable In-the-Wild Video Quality Assessment: A Database and a Language-Prompted Approach May 22, 2023 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 1Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner May 19, 2023 Dense Captioning Image Captioning
Code Code Available 1Surgical-VQLA: Transformer with Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery May 19, 2023 Answer Generation object-detection
Code Code Available 1MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical Images and Texts May 18, 2023 Medical Visual Question Answering Question Answering
Code Code Available 1PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering May 17, 2023 Benchmarking Diagnostic
Code Code Available 1Light-VQA: A Multi-Dimensional Quality Assessment Model for Low-Light Video Enhancement May 16, 2023 Video Enhancement Video Quality Assessment
Code Code Available 1Combo of Thinking and Observing for Outside-Knowledge VQA May 10, 2023 Decoder Question Answering
Code Code Available 1Visual Reasoning: from State to Transformation May 2, 2023 Visual Question Answering (VQA) Visual Reasoning
Code Code Available 1An Empirical Study of Multimodal Model Merging Apr 28, 2023 model Retrieval
Code Code Available 1Towards Robust Text-Prompted Semantic Criterion for In-the-Wild Video Quality Assessment Apr 28, 2023 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 1A Symmetric Dual Encoding Dense Retrieval Framework for Knowledge-Intensive Visual Question Answering Apr 26, 2023 Decoder Knowledge Distillation
Code Code Available 1SurgicalGPT: End-to-End Language-Vision GPT for Visual Question Answering in Surgery Apr 19, 2023 Question Answering Scene Segmentation
Code Code Available 1Learning Situation Hyper-Graphs for Video Question Answering Apr 18, 2023 Decoder Question Answering
Code Code Available 1Perceptual Quality Assessment of Face Video Compression: A Benchmark and An Effective Method Apr 14, 2023 Video Compression Video Quality Assessment
Code Code Available 1Zoom-VQA: Patches, Frames and Clips Integration for Video Quality Assessment Apr 13, 2023 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 1MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos Mar 27, 2023 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 1Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning Mar 25, 2023 Contrastive Learning Question Answering
Code Code Available 1MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models Mar 23, 2023 Auxiliary Learning Multimodal Sentiment Analysis
Code Code Available 1Top-Down Visual Attention from Analysis by Synthesis Mar 23, 2023 Retrieval Semantic Segmentation
Code Code Available 1TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering Mar 21, 2023 4k Image Generation
Code Code Available 1eP-ALM: Efficient Perceptual Augmentation of Language Models Mar 20, 2023 In-Context Learning Visual Question Answering (VQA)
Code Code Available 1VDPVE: VQA Dataset for Perceptual Video Enhancement Mar 16, 2023 Deblurring valid
Code Code Available 1Open-Ended Medical Visual Question Answering Through Prefix Tuning of Language Models Mar 10, 2023 Language Modeling Language Modelling
Code Code Available 1Prismer: A Vision-Language Model with Multi-Task Experts Mar 4, 2023 Few-Shot Learning Image Captioning
Code Code Available 1MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering Mar 2, 2023 Mixture-of-Experts Question Answering
Code Code Available 1BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs Mar 2, 2023 Articles Medical Visual Question Answering
Code Code Available 1ConTEXTual Net: A Multimodal Vision-Language Model for Segmentation of Pneumothorax Mar 2, 2023 Descriptive Image Captioning
Code Code Available 1RAMM: Retrieval-augmented Biomedical Visual Question Answering with Multi-modal Pre-training Mar 1, 2023 Question Answering Retrieval
Code Code Available 1Exploring Opinion-unaware Video Quality Assessment with Semantic Affinity Criterion Feb 26, 2023 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 1Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions? Feb 23, 2023 Open-Domain Question Answering Question Answering
Code Code Available 1Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities Feb 22, 2023 Entity Linking Fine-Grained Image Recognition
Code Code Available 1Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts Feb 17, 2023 Image Retrieval Image-text Classification
Code Code Available 1Multimodal Federated Learning via Contrastive Representation Ensemble Feb 17, 2023 Federated Learning Image-text Retrieval
Code Code Available 1UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling Feb 13, 2023 Image-text Retrieval Retrieval
Code Code Available 1Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications Feb 1, 2023 Question Answering Representation Learning
Code Code Available 1UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers Jan 31, 2023 Image Captioning Image Classification
Code Code Available 1SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images Jan 12, 2023 Evidence Selection Question Answering
Code Code Available 1Multimodal Inverse Cloze Task for Knowledge-based Visual Question Answering Jan 11, 2023 Question Answering Reading Comprehension
Code Code Available 1VQACL: A Novel Visual Question Answering Continual Learning Setting Jan 1, 2023 Continual Learning Question Answering
Code Code Available 1Variational Causal Inference Network for Explanatory Visual Question Answering Jan 1, 2023 Explanation Generation Explanatory Visual Question Answering
Code Code Available 1MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering Dec 19, 2022 Form Question Answering
Code Code Available 1Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations Dec 8, 2022 Explanation Generation Visual Entailment
Code Code Available 1