BrainCLIP: Bridging Brain and Visual-Linguistic Representation Via CLIP for Generic Natural Visual Stimulus Decoding Feb 25, 2023 Brain Decoding Image Generation
Code Code Available 1ECCV Caption: Correcting False Negatives by Collecting Machine-and-Human-verified Image-Caption Associations for MS-COCO Apr 7, 2022 Image-text matching Text Matching
Code Code Available 1UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding Jul 3, 2023 Image-text matching Sentence
Code Code Available 1Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models Jun 10, 2025 Contrastive Learning Image-text matching
Code Code Available 1Advancing Visual Grounding with Scene Knowledge: Benchmark and Method Jul 21, 2023 Image-text matching Text Matching
Code Code Available 1UGNCL: Uncertainty-Guided Noisy Correspondence Learning for Efficient Cross-Modal Matching Jul 11, 2024 Cross-Modal Retrieval Cross-modal retrieval with noisy correspondence
Code Code Available 1Zero-Shot Video Captioning with Evolving Pseudo-Tokens Jul 22, 2022 Image Captioning Image-text matching
Code Code Available 1Transformer Reasoning Network for Image-Text Matching and Retrieval Apr 20, 2020 Image Retrieval Image-text matching
Code Code Available 1CLIP is Strong Enough to Fight Back: Test-time Counterattacks towards Zero-shot Adversarial Robustness of CLIP Mar 5, 2025 Adversarial Robustness Image-text matching
Code Code Available 1Align before Fuse: Vision and Language Representation Learning with Momentum Distillation Jul 16, 2021 Cross-Modal Retrieval Grounded language learning
Code Code Available 1Fine-Grained Image-Text Matching by Cross-Modal Hard Aligning Network Jan 1, 2023 Image-text matching Retrieval
Code Code Available 1CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation Feb 27, 2025 Image-text matching Object
Code Code Available 1Towards Unified Text-based Person Retrieval: A Large-scale Multi-Attribute and Language Search Benchmark Jun 5, 2023 Attribute Image-text matching
Code Code Available 1Graph Structured Network for Image-Text Matching Apr 1, 2020 Attribute Cross-Modal Retrieval
Code Code Available 1Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding Nov 30, 2023 Attribute Compositional Zero-Shot Learning
Code Code Available 1GRIT-VLP: Grouped Mini-batch Sampling for Efficient Vision and Language Pre-training Aug 8, 2022 Image-text matching Language Modeling
Code Code Available 1ColorSwap: A Color and Word Order Dataset for Multimodal Evaluation Feb 7, 2024 Image Generation Image-text matching
Code Code Available 1Composing Object Relations and Attributes for Image-Text Matching Jun 17, 2024 Attribute Graph Attention
Code Code Available 1ComCLIP: Training-Free Compositional Image and Text Matching Nov 25, 2022 Image-text matching Image-text Retrieval
Code Code Available 1Visual Semantic Reasoning for Image-Text Matching Sep 6, 2019 Cross-Modal Retrieval Image Retrieval
Code Code Available 1Image-text matching for large-scale book collections Jul 29, 2024 Image-text matching Optical Character Recognition (OCR)
Code Code Available 1Consensus-Aware Visual-Semantic Embedding for Image-Text Matching Jul 17, 2020 Image Captioning Image-text matching
Code Code Available 1Improved Probabilistic Image-Text Representations May 29, 2023 Data Augmentation Image-text matching
Code Code Available 1Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations May 6, 2023 Image-text matching Text Matching
Code Code Available 1Stacked Cross Attention for Image-Text Matching Mar 21, 2018 Cross-Modal Retrieval Image Retrieval
Code Code Available 1CCMB: A Large-scale Chinese Cross-modal Benchmark May 8, 2022 image-classification Image Classification
Code Code Available 1Similarity Reasoning and Filtration for Image-Text Matching Jan 5, 2021 Cross-Modal Retrieval Image Retrieval
Code Code Available 1Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation Dec 10, 2021 Image-text matching Image-text Retrieval
— Unverified 0MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations Mar 2, 2025 image-classification Image Classification
— Unverified 0A Concept-Centric Approach to Multi-Modality Learning Dec 18, 2024 Image-text matching Question Answering
— Unverified 0Active Mining Sample Pair Semantics for Image-text Matching Nov 9, 2023 Active Learning Image-text matching
— Unverified 0AdsCVLR: Commercial Visual-Linguistic Representation Modeling in Sponsored Search Oct 10, 2022 Contrastive Learning Image-text matching
— Unverified 0Advanced Multimodal Deep Learning Architecture for Image-Text Matching Jun 13, 2024 Deep Learning Image-text matching
— Unverified 0A Novel Attention-based Aggregation Function to Combine Vision and Language Apr 27, 2020 General Classification Image Captioning
— Unverified 0A Self-Boosting Framework for Automated Radiographic Report Generation Jun 19, 2021 Image Captioning Image-text matching
— Unverified 0Automatic Prompt Generation and Grounding Object Detection for Zero-Shot Image Anomaly Detection Nov 28, 2024 Anomaly Detection Image-text matching
— Unverified 0Breaking Through the Noisy Correspondence: A Robust Model for Image-Text Matching Apr 29, 2024 Cross-modal retrieval with noisy correspondence Image-text matching
— Unverified 0Bridging the Modality Gap: Dimension Information Alignment and Sparse Spatial Constraint for Image-Text Matching Oct 22, 2024 Contrastive Learning Image-text matching
— Unverified 0CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering May 13, 2024 Audio-visual Question Answering Audio-Visual Question Answering (AVQA)
— Unverified 0Constructing Multilingual Visual-Text Datasets Revealing Visual Multilingual Ability of Vision Language Models Mar 29, 2024 Image-text matching Object Recognition
— Unverified 0A New Fine-grained Alignment Method for Image-text Matching Nov 3, 2023 Image-text matching Image-text Retrieval
— Unverified 0Cross-modal Subspace Learning for Fine-grained Sketch-based Image Retrieval May 28, 2017 Cross-Modal Retrieval Image Retrieval
— Unverified 0CILF-CIAE: CLIP-driven Image-Language Fusion for Correcting Inverse Age Estimation Dec 4, 2023 Age Estimation Image-text matching
— Unverified 0DARE: Diverse Visual Question Answering with Robustness Evaluation Sep 26, 2024 image-classification Image Classification
— Unverified 0DEMO: A Statistical Perspective for Efficient Image-Text Matching May 19, 2024 Image-text matching Model Optimization
— Unverified 0Dependency Structure Augmented Contextual Scoping Framework for Multimodal Aspect-Based Sentiment Analysis Apr 15, 2025 Aspect-Based Sentiment Analysis Dependency Parsing
— Unverified 0Descriptive Image-Text Matching with Graded Contextual Similarity May 15, 2025 Descriptive Image-text matching
— Unverified 0Discrete-continuous Action Space Policy Gradient-based Attention for Image-Text Matching Apr 21, 2021 Image-text matching Text Matching
— Unverified 0Don't Stop Learning: Towards Continual Learning for the CLIP Model Jul 19, 2022 Continual Learning Image-text matching
— Unverified 0DT2I: Dense Text-to-Image Generation from Region Descriptions Apr 5, 2022 Conditional Image Generation Image Generation
— Unverified 0