Multi-event Video-Text Retrieval Aug 22, 2023 Language Modelling Retrieval
Code Code Available 1ALIP: Adaptive Language-Image Pre-training with Synthetic Caption Aug 16, 2023 Action Classification Image-text Retrieval
Code Code Available 1Helping Hands: An Object-Aware Ego-Centric Video Recognition Model Aug 15, 2023 Decoder Object
Code Code Available 1Vision-Language Dataset Distillation Aug 15, 2023 Dataset Distillation image-classification
Code Code Available 1AdvCLIP: Downstream-agnostic Adversarial Examples in Multimodal Contrastive Learning Aug 14, 2023 Contrastive Learning Generative Adversarial Network
Code Code Available 1Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention Masks Aug 13, 2023 Contrastive Learning image-classification
— Unverified 0Embedding-based Retrieval with LLM for Effective Agriculture Information Extracting from Unstructured Data Aug 6, 2023 Language Modeling Language Modelling
— Unverified 0The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World Aug 3, 2023 All Question Answering
Code Code Available 2Defense of Adversarial Ranking Attack in Text Retrieval: Benchmark and Baseline via Detection Jul 31, 2023 Adversarial Attack Information Retrieval
— Unverified 0Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models Jul 26, 2023 Image-text Retrieval Retrieval
Code Code Available 1PRIOR: Prototype Representation Joint Learning from Medical Images and Reports Jul 24, 2023 Contrastive Learning Image to text
Code Code Available 1Towards a Visual-Language Foundation Model for Computational Pathology Jul 24, 2023 Contrastive Learning image-classification
— Unverified 0Extracting Molecular Properties from Natural Language with Multimodal Contrastive Learning Jul 22, 2023 Contrastive Learning Property Prediction
— Unverified 0Distilling Knowledge from Text-to-Image Generative Models Improves Visio-Linguistic Reasoning in CLIP Jul 18, 2023 Attribute Image-text Retrieval
— Unverified 0mCLIP: Multilingual CLIP via Cross-lingual Transfer Jul 10, 2023 Contrastive Learning Cross-Lingual Transfer
Code Code Available 1Stop Pre-Training: Adapt Visual-Language Models to Unseen Languages Jun 29, 2023 Image-text Retrieval Machine Translation
Code Code Available 0Learning to Rank in Generative Retrieval Jun 27, 2023 Learning-To-Rank Passage Ranking
Code Code Available 1Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input Jun 25, 2023 Diversity Image-text Retrieval
— Unverified 0TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter Jun 22, 2023 Question Answering Retrieval
Code Code Available 0RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing Jun 20, 2023 Cross-Modal Retrieval Image Retrieval
Code Code Available 2MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian Jun 20, 2023 Cross-Lingual Transfer Retrieval
Code Code Available 0Align, Adapt and Inject: Sound-guided Unified Image Generation Jun 20, 2023 Image Generation Retrieval
— Unverified 0RemoteCLIP: A Vision Language Foundation Model for Remote Sensing Jun 19, 2023 Classification Cross-Modal Retrieval
Code Code Available 2Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional Understanding Jun 15, 2023 Contrastive Learning image-classification
Code Code Available 1Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive Training Jun 15, 2023 Image-text Retrieval Representation Learning
Code Code Available 1Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language Representations Jun 14, 2023 image-classification Image Classification
Code Code Available 1h2oGPT: Democratizing Large Language Models Jun 13, 2023 Chatbot Fairness
Code Code Available 6Global and Local Semantic Completion Learning for Vision-Language Pre-training Jun 12, 2023 cross-modal alignment Image-text Retrieval
Code Code Available 1Multi-modal Pre-training for Medical Vision-language Understanding and Generation: An Empirical Study with A New Benchmark Jun 10, 2023 Image-text Retrieval Medical Report Generation
Code Code Available 1Revisiting the Role of Language Priors in Vision-Language Models Jun 2, 2023 Image-text matching Image-text Retrieval
Code Code Available 1Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models May 29, 2023 Image Captioning Image Classification
Code Code Available 1FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions May 28, 2023 Attribute Image Captioning
Code Code Available 1CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers May 27, 2023 Image Captioning Image Retrieval
Code Code Available 1Integrating Listwise Ranking into Pairwise-based Image-Text Retrieval May 26, 2023 Image-text Retrieval Retrieval
Code Code Available 0Enhancing the Ranking Context of Dense Retrieval Methods through Reciprocal Nearest Neighbors May 25, 2023 Contrastive Learning Reranking
Code Code Available 0PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts May 24, 2023 Dialogue State Tracking Image Retrieval
Code Code Available 0S-CLIP: Semi-supervised Vision-Language Learning using Few Specialist Captions May 23, 2023 Contrastive Learning Image-text Retrieval
Code Code Available 1When the Music Stops: Tip-of-the-Tongue Retrieval for Music May 23, 2023 Benchmarking Language Modeling
Code Code Available 0i-Code Studio: A Configurable and Composable Framework for Integrative AI May 23, 2023 Question Answering Retrieval
— Unverified 0VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending May 22, 2023 Question Answering Retrieval
— Unverified 0Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner May 19, 2023 Dense Captioning Image Captioning
Code Code Available 1TOME: A Two-stage Approach for Model-based Retrieval May 18, 2023 Natural Questions Retrieval
— Unverified 0ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities May 18, 2023 1 Image, 2*2 Stitchi Action Classification
Code Code Available 3Mask to reconstruct: Cooperative Semantics Completion for Video-text Retrieval May 13, 2023 Retrieval Text Retrieval
— Unverified 0Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers May 11, 2023 Contrastive Learning Image-text Retrieval
Code Code Available 1Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception May 10, 2023 Classification image-classification
— Unverified 0Cross-Modal Retrieval for Motion and Text via DopTriple Loss May 7, 2023 Cross-Modal Retrieval Retrieval
Code Code Available 1Understanding Differential Search Index for Text Retrieval May 3, 2023 Information Retrieval Retrieval
Code Code Available 1From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping Apr 26, 2023 Decoder Image Captioning
Code Code Available 1Hypernymization of named entity-rich captions for grounding-based multi-modal pretraining Apr 25, 2023 Articles Image-text Retrieval
— Unverified 0