Enhancing Recipe Retrieval with Foundation Models: A Data Augmentation Perspective Dec 8, 2023 Cross-Modal Retrieval Data Augmentation
Code Code Available 1Uni3DL: Unified Model for 3D and Language Understanding Dec 5, 2023 Cross-Modal Retrieval Instance Segmentation
— Unverified 0T3D: Advancing 3D Medical Vision-Language Pre-training by Learning Multi-View Visual Consistency Dec 3, 2023 Clinical Knowledge Contrastive Learning
— Unverified 0Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models Nov 27, 2023 Cross-Modal Retrieval Image Generation
Code Code Available 1Invisible Relevance Bias: Text-Image Retrieval Models Prefer AI-Generated Images Nov 23, 2023 Cross-Modal Retrieval Image Retrieval
Code Code Available 0Contrastive Transformer Learning with Proximity Data Generation for Text-Based Person Search Nov 15, 2023 Contrastive Learning Cross-Modal Retrieval
Code Code Available 0Weakly supervised cross-modal learning in high-content screening Nov 8, 2023 Cross-Modal Retrieval Drug Discovery
Code Code Available 1BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping Oct 29, 2023 Contrastive Learning Cross-Modal Retrieval
Code Code Available 1A Prior Instruction Representation Framework for Remote Sensing Image-text Retrieval Oct 27, 2023 Cross-Modal Retrieval Image-text Retrieval
Code Code Available 1InvGC: Robust Cross-Modal Retrieval by Inverse Graph Convolution Oct 20, 2023 Cross-Modal Retrieval Retrieval
Code Code Available 0Two-Stage Triplet Loss Training with Curriculum Augmentation for Audio-Visual Retrieval Oct 20, 2023 Cross-Modal Retrieval Retrieval
— Unverified 0Balance Act: Mitigating Hubness in Cross-Modal Retrieval with Query and Gallery Banks Oct 17, 2023 Cross-Modal Retrieval Retrieval
Code Code Available 0PaLI-3 Vision Language Models: Smaller, Faster, Stronger Oct 13, 2023 Chart Question Answering Cross-Modal Retrieval
Code Code Available 1Direction-Oriented Visual-semantic Embedding Model for Remote Sensing Image-text Retrieval Oct 12, 2023 Cross-Modal Retrieval Image-text Retrieval
— Unverified 0BioBridge: Bridging Biomedical Foundation Models via Knowledge Graphs Oct 5, 2023 Cross-Modal Retrieval Domain Generalization
Code Code Available 1Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval Sep 29, 2023 Cross-Modal Retrieval Image-text matching
Code Code Available 1ELIP: Efficient Language-Image Pre-training with Fewer Vision Tokens Sep 28, 2023 Cross-Modal Retrieval GPU
Code Code Available 0Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored Search Sep 28, 2023 cross-modal alignment Cross-Modal Retrieval
Code Code Available 0Implicit Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis Sep 21, 2023 Cross-Modal Retrieval Image Captioning
Code Code Available 0Sound Source Localization is All about Cross-Modal Alignment Sep 19, 2023 All cross-modal alignment
— Unverified 0Learning Tri-modal Embeddings for Zero-Shot Soundscape Mapping Sep 19, 2023 Cross-Modal Retrieval
Code Code Available 1Dual-view Curricular Optimal Transport for Cross-lingual Cross-modal Retrieval Sep 11, 2023 Cross-Lingual Transfer Cross-Modal Retrieval
— Unverified 0A Survey on Interpretable Cross-modal Reasoning Sep 5, 2023 Cross-Modal Retrieval Decision Making
Code Code Available 1Multimodal Foundation Models For Echocardiogram Interpretation Aug 29, 2023 Cross-Modal Retrieval Diagnostic
Code Code Available 1Cross-Modal Retrieval Meets Inference:Improving Zero-Shot Classification with Cross-Modal Retrieval Aug 29, 2023 Cross-Modal Retrieval image-classification
— Unverified 0Cross-Modal Retrieval: A Systematic Review of Methods and Future Directions Aug 28, 2023 Cross-Modal Retrieval Retrieval
Code Code Available 1Extending Cross-Modal Retrieval with Interactive Learning to Improve Image Retrieval Performance in Forensics Aug 28, 2023 Cross-Modal Retrieval Image Retrieval
— Unverified 0Video and Audio are Images: A Cross-Modal Mixer for Original Data on Video-Audio Retrieval Aug 26, 2023 Cross-Modal Retrieval Decoder
— Unverified 0Parameter-Efficient Transfer Learning for Remote Sensing Image-Text Retrieval Aug 24, 2023 Cross-Modal Retrieval Image-text matching
Code Code Available 1An Empirical Study of CLIP for Text-based Person Search Aug 19, 2023 Cross-Modal Retrieval Data Augmentation
Code Code Available 1Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval Aug 8, 2023 Cross-Modal Retrieval Image Retrieval
Code Code Available 1CLIP-KD: An Empirical Study of CLIP Model Distillation Jul 24, 2023 Contrastive Learning Cross-Modal Retrieval
Code Code Available 1PiTL: Cross-modal Retrieval with Weakly-supervised Vision-language Pre-training via Prompting Jul 14, 2023 Cross-Modal Retrieval Image to text
— Unverified 0A scoping review on multimodal deep learning in biomedical images and texts Jul 14, 2023 Cross-Modal Retrieval Decision Making
— Unverified 0mCLIP: Multilingual CLIP via Cross-lingual Transfer Jul 10, 2023 Contrastive Learning Cross-Lingual Transfer
Code Code Available 1Alternative Telescopic Displacement: An Efficient Multimodal Alignment Method Jun 29, 2023 Arrhythmia Detection Cross-Modal Retrieval
Code Code Available 0Cross-modal transformers for infrared and visible image fusion Jun 26, 2023 Cross-Modal Retrieval Depth Estimation
Code Code Available 1Quilt-1M: One Million Image-Text Pairs for Histopathology Jun 20, 2023 Automatic Speech Recognition Cross-Modal Retrieval
Code Code Available 1RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing Jun 20, 2023 Cross-Modal Retrieval Image Retrieval
Code Code Available 2RemoteCLIP: A Vision Language Foundation Model for Remote Sensing Jun 19, 2023 Classification Cross-Modal Retrieval
Code Code Available 2Reducing Semantic Confusion: Scene-aware Aggregation Network for Remote Sensing Cross-modal Retrieval Jun 12, 2023 Cross-Modal Retrieval Retrieval
Code Code Available 1Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks Jun 7, 2023 Cross-Modal Retrieval Language Modelling
Code Code Available 2MolFM: A Multimodal Molecular Foundation Model Jun 6, 2023 Cross-Modal Retrieval Knowledge Graphs
Code Code Available 2End-to-end Knowledge Retrieval with Multi-modal Queries Jun 1, 2023 Benchmarking Cross-Modal Retrieval
Code Code Available 1Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL Models May 31, 2023 Cross-Modal Retrieval Question Answering
Code Code Available 1VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset May 29, 2023 Audio captioning Audio-Visual Captioning
Code Code Available 2Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis May 25, 2023 Cross-Modal Retrieval Object
— Unverified 0Continual Vision-Language Representation Learning with Off-Diagonal Information May 11, 2023 Continual Learning Contrastive Learning
— Unverified 0ImageBind: One Embedding Space To Bind Them All May 9, 2023 All Cross-Modal Retrieval
Code Code Available 5Cross-Modal Retrieval for Motion and Text via DopTriple Loss May 7, 2023 Cross-Modal Retrieval Retrieval
Code Code Available 1