Improving Medical Visual Representation Learning with Pathological-level Cross-Modal Alignment and Correlation Exploration Jun 12, 2025 cross-modal alignment Image to text
— Unverified 0Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models Jun 10, 2025 Contrastive Learning Image-text matching
Code Code Available 1Towards Cross-modal Retrieval in Chinese Cultural Heritage Documents: Dataset and Solution May 16, 2025 Cross-Modal Retrieval Image to text
— Unverified 0SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs Apr 17, 2025 Cross-Modal Retrieval Image Retrieval
— Unverified 0DART: Disease-aware Image-Text Alignment and Self-correcting Re-alignment for Trustworthy Radiology Report Generation Apr 16, 2025 Contrastive Learning Image to text
— Unverified 0ABC: Achieving Better Control of Multimodal Embeddings using VLMs Mar 1, 2025 Image to text Image-to-Text Retrieval
— Unverified 0Retaining Knowledge and Enhancing Long-Text Representations in CLIP through Dual-Teacher Distillation Jan 1, 2025 image-classification Image Classification
— Unverified 0DIR: Retrieval-Augmented Image Captioning with Comprehensive Understanding Dec 2, 2024 Caption Generation Domain Generalization
— Unverified 0Robotic State Recognition with Image-to-Text Retrieval Task of Pre-Trained Vision-Language Model and Black-Box Optimization Oct 30, 2024 Image to text Image-to-Text Retrieval
— Unverified 0Robotic Environmental State Recognition with Pre-Trained Vision-Language Models and Black-Box Optimization Sep 26, 2024 Image to text Image-to-Text Retrieval
— Unverified 0GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language Models Jul 30, 2024 Image to text Image-to-Text Retrieval
Code Code Available 0Towards a text-based quantitative and explainable histopathology image analysis Jul 10, 2024 image-classification Image Classification
Code Code Available 0BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval Jun 14, 2024 Image Retrieval Image to text
Code Code Available 0Understanding the Effect of using Semantically Meaningful Tokens for Visual Representation Learning May 26, 2024 Image to text Image-to-Text Retrieval
— Unverified 0Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment Apr 28, 2024 Cross-Modal Retrieval Image Retrieval
Code Code Available 2CLIP the Bias: How Useful is Balancing Data in Multimodal Learning? Mar 7, 2024 Image to text Image-to-Text Retrieval
— Unverified 0Accept the Modality Gap: An Exploration in the Hyperbolic Space Jan 1, 2024 Image to text Image-to-Text Retrieval
— Unverified 0Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment Jan 1, 2024 cross-modal alignment Cross-Modal Retrieval
Code Code Available 2InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks Dec 21, 2023 Image Retrieval Image-to-Text Retrieval
Code Code Available 1Negative Pre-aware for Noisy Cross-modal Matching Dec 10, 2023 Cross-modal retrieval with noisy correspondence Image-text matching
Code Code Available 1Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval Sep 29, 2023 Cross-Modal Retrieval Image-text matching
Code Code Available 1Vision-Language Dataset Distillation Aug 15, 2023 Dataset Distillation image-classification
Code Code Available 1PRIOR: Prototype Representation Joint Learning from Medical Images and Reports Jul 24, 2023 Contrastive Learning Image to text
Code Code Available 1Towards a Visual-Language Foundation Model for Computational Pathology Jul 24, 2023 Contrastive Learning image-classification
— Unverified 0RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing Jun 20, 2023 Cross-Modal Retrieval Image Retrieval
Code Code Available 2CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers May 27, 2023 Image Captioning Image Retrieval
Code Code Available 1ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities May 18, 2023 1 Image, 2*2 Stitchi Action Classification
Code Code Available 3Rethinking Benchmarks for Cross-modal Image-text Retrieval Apr 21, 2023 Cross-Modal Retrieval Image-text Retrieval
Code Code Available 1Is Cross-modal Information Retrieval Possible without Training? Apr 20, 2023 Contrastive Learning Cross-Modal Information Retrieval
— Unverified 0Sigmoid Loss for Language Image Pre-Training Mar 27, 2023 Contrastive Learning Disentanglement
Code Code Available 3Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images Mar 13, 2023 Common Sense Reasoning Explanation Generation
— Unverified 0UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers Jan 31, 2023 Image Captioning Image Classification
Code Code Available 1BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models Jan 30, 2023 Generative Visual Question Answering Image Captioning
Code Code Available 4HADA: A Graph-based Amalgamation Framework in Image-text Retrieval Jan 11, 2023 Graph Neural Network Image Retrieval
Code Code Available 0When are Lemons Purple? The Concept Association Bias of Vision-Language Models Dec 22, 2022 Attribute image-classification
— Unverified 0A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval Dec 6, 2022 Cross-Modal Retrieval Image-text matching
Code Code Available 1A survey on knowledge-enhanced multimodal learning Nov 19, 2022 Conditional Image Generation Factual Visual Question Answering
— Unverified 0AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities Nov 12, 2022 Contrastive Learning Cross-Modal Retrieval
Code Code Available 4ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text Pre-training Sep 30, 2022 Computational Efficiency Contrastive Learning
Code Code Available 0FETA: Towards Specializing Foundation Models for Expert Task Applications Sep 8, 2022 Domain Generalization Few-Shot Learning
Code Code Available 1Design of the topology for contrastive visual-textual alignment Sep 5, 2022 Contrastive Learning Image-to-Text Retrieval
Code Code Available 0Paired Cross-Modal Data Augmentation for Fine-Grained Image-to-Text Retrieval Jul 29, 2022 Cross-Modal Retrieval Data Augmentation
— Unverified 0Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs Jun 9, 2022 Image Captioning Image Classification
Code Code Available 2Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset May 25, 2022 Image Captioning Image Retrieval
— Unverified 0COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval Apr 15, 2022 Contrastive Learning Cross-Modal Retrieval
— Unverified 0IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages Jan 27, 2022 Cross-Modal Retrieval Few-Shot Learning
Code Code Available 1FLAVA: A Foundational Language And Vision Alignment Model Dec 8, 2021 Image Retrieval Image-to-Text Retrieval
Code Code Available 1Align before Fuse: Vision and Language Representation Learning with Momentum Distillation Jul 16, 2021 Cross-Modal Retrieval Grounded language learning
Code Code Available 1OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation Jul 1, 2021 Audio to Text Retrieval Cross-Modal Retrieval
Code Code Available 0A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval Jun 4, 2021 Graph Matching Image Retrieval
Code Code Available 1