HGAN: Hierarchical Graph Alignment Network for Image-Text Retrieval Dec 16, 2022 Image-text Retrieval Retrieval
— Unverified 0Retrieval-based Disentangled Representation Learning with Natural Language Supervision Dec 15, 2022 Cross-Modal Retrieval Disentanglement
— Unverified 0Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift Dec 15, 2022 Benchmarking Image Captioning
Code Code Available 1FlexiViT: One Model for All Patch Sizes Dec 15, 2022 All Image-text Retrieval
Code Code Available 1Pre-trained Language Models Can be Fully Zero-Shot Learners Dec 14, 2022 Retrieval text-classification
Code Code Available 0NLIP: Noise-robust Language-Image Pre-training Dec 14, 2022 Image Captioning Image-text Retrieval
— Unverified 0Attentive Deep Neural Networks for Legal Document Retrieval Dec 13, 2022 Articles Question Answering
— Unverified 0Scale-Semantic Joint Decoupling Network for Image-text Retrieval in Remote Sensing Dec 12, 2022 Cross-Modal Retrieval Image-text Retrieval
— Unverified 0DialogCC: An Automated Pipeline for Creating High-Quality Multi-Modal Dialogue Dataset Dec 8, 2022 Diversity Image Description
Code Code Available 1Named Entity and Relation Extraction with Multi-Modal Retrieval Dec 3, 2022 Mixture-of-Experts Multi-modal Named Entity Recognition
— Unverified 0Masked Contrastive Pre-Training for Efficient Video-Text Retrieval Dec 2, 2022 Image-text Retrieval Retrieval
— Unverified 0Dense Text Retrieval based on Pretrained Language Models: A Survey Nov 27, 2022 Retrieval Survey
Code Code Available 2ComCLIP: Training-Free Compositional Image and Text Matching Nov 25, 2022 Image-text matching Image-text Retrieval
Code Code Available 1Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning Nov 24, 2022 cross-modal alignment Image-text Retrieval
Code Code Available 1MSLKANet: A Multi-Scale Large Kernel Attention Network for Scene Text Removal Nov 12, 2022 Retrieval Text Retrieval
— Unverified 0On Negative Sampling for Contrastive Audio-Text Retrieval Nov 8, 2022 Audio to Text Retrieval Contrastive Learning
— Unverified 0Arabic Text Mining Nov 4, 2022 Retrieval text-classification
— Unverified 0M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval Nov 2, 2022 Image Retrieval Retrieval
— Unverified 0Exploring Train and Test-Time Augmentations for Audio-Language Learning Oct 31, 2022 Audio captioning Audio to Text Retrieval
— Unverified 0Generative Negative Text Replay for Continual Vision-Language Pretraining Oct 31, 2022 Continual Learning image-classification
— Unverified 0COCO-DR: Combating Distribution Shifts in Zero-Shot Dense Retrieval with Contrastive and Distributionally Robust Learning Oct 27, 2022 Language Modeling Language Modelling
Code Code Available 1RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing Data Oct 23, 2022 Image Captioning Image-text Retrieval
— Unverified 0Dissecting Deep Metric Learning Losses for Image-Text Retrieval Oct 21, 2022 Cross-Modal Retrieval Image-text matching
Code Code Available 0SimANS: Simple Ambiguous Negatives Sampling for Dense Text Retrieval Oct 21, 2022 Retrieval Text Retrieval
— Unverified 0An Analysis of Fusion Functions for Hybrid Retrieval Oct 21, 2022 Retrieval Text Retrieval
— Unverified 0Image-Text Retrieval with Binary and Continuous Label Supervision Oct 20, 2022 Image Captioning Image-text Retrieval
— Unverified 0VTC: Improving Video-Text Retrieval with User Comments Oct 19, 2022 Representation Learning Retrieval
Code Code Available 1MedCLIP: Contrastive Learning from Unpaired Medical Images and Text Oct 18, 2022 Contrastive Learning Image-text Retrieval
Code Code Available 2Vision-Language Pre-training: Basics, Recent Advances, and Future Trends Oct 17, 2022 Few-Shot Learning Image Captioning
Code Code Available 3MTEB: Massive Text Embedding Benchmark Oct 13, 2022 Benchmarking Information Retrieval
Code Code Available 4Mixed-modality Representation Learning and Pre-training for Joint Table-and-Text Retrieval in OpenQA Oct 11, 2022 Open-Domain Question Answering Question Answering
Code Code Available 1MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model Oct 11, 2022 Contrastive Learning Image-text matching
Code Code Available 1MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning Oct 9, 2022 Image-text Retrieval multimodal interaction
— Unverified 0Learning to embed semantic similarity for joint image-text retrieval Oct 7, 2022 Image-text Retrieval Metric Learning
— Unverified 0Nonparametric Decoding for Generative Retrieval Oct 5, 2022 Decoder Language Modelling
Code Code Available 1SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model Oct 3, 2022 Language Modeling Language Modelling
Code Code Available 1Label Smoothing for Text Mining Oct 1, 2022 Retrieval text-classification
— Unverified 0Efficient Multilingual Multi-modal Pre-training through Triple Contrastive Loss Oct 1, 2022 image-classification Image Classification
— Unverified 0DecAF: Joint Decoding of Answers and Logical Forms for Question Answering over Knowledge Bases Sep 30, 2022 Entity Linking Question Answering
Code Code Available 1Re-Imagen: Retrieval-Augmented Text-to-Image Generator Sep 29, 2022 Image Generation Image-text Retrieval
— Unverified 0TokenFlow: Rethinking Fine-grained Cross-modal Alignment in Vision-Language Retrieval Sep 28, 2022 cross-modal alignment Retrieval
— Unverified 0Mr. Right: Multimodal Retrieval on Representation of ImaGe witH Text Sep 28, 2022 Image Captioning Image Retrieval
Code Code Available 1Audio Retrieval with WavText5K and CLAP Training Sep 28, 2022 AudioCaps Audio captioning
Code Code Available 1Unified Loss of Pair Similarity Optimization for Vision-Language Retrieval Sep 28, 2022 Contrastive Learning Retrieval
— Unverified 0Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval Sep 27, 2022 Cross-Modal Retrieval Retrieval
— Unverified 0OmniVL:One Foundation Model for Image-Language and Video-Language Tasks Sep 15, 2022 Action Classification Action Recognition
— Unverified 0CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment Sep 14, 2022 Retrieval Text Retrieval
Code Code Available 2Unified Generative & Dense Retrieval for Query Rewriting in Sponsored Search Sep 13, 2022 Retrieval Text Generation
— Unverified 0VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models Sep 12, 2022 Attribute Image-text Retrieval
Code Code Available 0FETA: Towards Specializing Foundation Models for Expert Task Applications Sep 8, 2022 Domain Generalization Few-Shot Learning
Code Code Available 1