| A Dense Representation Framework for Lexical and Semantic Matching | Jun 20, 2022 | RetrievalSemantic Text Matching | CodeCode Available | 1 | 5 |
| LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation | May 18, 2023 | AttributeImage Generation | CodeCode Available | 1 | 5 |
| ColorSwap: A Color and Word Order Dataset for Multimodal Evaluation | Feb 7, 2024 | Image GenerationImage-text matching | CodeCode Available | 1 | 5 |
| Narrative Action Evaluation with Prompt-Guided Multimodal Interaction | Apr 22, 2024 | Action Quality Assessmentmultimodal interaction | CodeCode Available | 1 | 5 |
| ActionCLIP: A New Paradigm for Video Action Recognition | Sep 17, 2021 | Action ClassificationAction Recognition | CodeCode Available | 1 | 5 |
| ComCLIP: Training-Free Compositional Image and Text Matching | Nov 25, 2022 | Image-text matchingImage-text Retrieval | CodeCode Available | 1 | 5 |
| Composing Object Relations and Attributes for Image-Text Matching | Jun 17, 2024 | AttributeGraph Attention | CodeCode Available | 1 | 5 |
| Universal Weighting Metric Learning for Cross-Modal Matching | Oct 7, 2020 | Image-text matchingMetric Learning | CodeCode Available | 1 | 5 |
| Consensus-Aware Visual-Semantic Embedding for Image-Text Matching | Jul 17, 2020 | Image CaptioningImage-text matching | CodeCode Available | 1 | 5 |
| Match-Ignition: Plugging PageRank into Transformer for Long-form Text Matching | Jan 16, 2021 | Community Question AnsweringForm | CodeCode Available | 1 | 5 |
| Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer | Jul 5, 2022 | Image-text matchingKnowledge Distillation | CodeCode Available | 1 | 5 |
| MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model | Oct 11, 2022 | Contrastive LearningImage-text matching | CodeCode Available | 1 | 5 |
| More Grounded Image Captioning by Distilling Image-Text Matching Model | Apr 1, 2020 | Image CaptioningImage-text matching | CodeCode Available | 1 | 5 |
| Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models | Jun 10, 2025 | Contrastive LearningImage-text matching | CodeCode Available | 1 | 5 |
| Advancing Visual Grounding with Scene Knowledge: Benchmark and Method | Jul 21, 2023 | Image-text matchingText Matching | CodeCode Available | 1 | 5 |
| MMoE: Enhancing Multimodal Models with Mixtures of Multimodal Interaction Experts | Nov 16, 2023 | Binary ClassificationDescriptive | CodeCode Available | 1 | 5 |
| MultiCQA: Zero-Shot Transfer of Self-Supervised Text Matching Models on a Massive Scale | Oct 2, 2020 | Answer SelectionCommunity Question Answering | CodeCode Available | 1 | 5 |
| ECCV Caption: Correcting False Negatives by Collecting Machine-and-Human-verified Image-Caption Associations for MS-COCO | Apr 7, 2022 | Image-text matchingText Matching | CodeCode Available | 1 | 5 |
| AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks | Nov 28, 2017 | Generative Adversarial NetworkImage Generation | CodeCode Available | 1 | 5 |
| Cross-modal Active Complementary Learning with Self-refining Correspondence | Oct 26, 2023 | Cross-modal retrieval with noisy correspondenceImage-text matching | CodeCode Available | 1 | 5 |
| HANet: Hierarchical Alignment Networks for Video-Text Retrieval | Jul 26, 2021 | RetrievalText Matching | CodeCode Available | 1 | 5 |
| Extractive Summarization as Text Matching | Apr 19, 2020 | Document SummarizationExtractive Summarization | CodeCode Available | 1 | 5 |
| Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning | Mar 1, 2020 | Cross-Modal RetrievalRetrieval | CodeCode Available | 1 | 5 |
| Fine-Grained Image-Text Matching by Cross-Modal Hard Aligning Network | Jan 1, 2023 | Image-text matchingRetrieval | CodeCode Available | 1 | 5 |
| Negative-Aware Attention Framework for Image-Text Matching | Jan 1, 2022 | Image-text matchingText Matching | CodeCode Available | 1 | 5 |
| Text-Guided Neural Image Inpainting | Apr 7, 2020 | DescriptiveImage Generation | CodeCode Available | 1 | 5 |
| Declaration-based Prompt Tuning for Visual Question Answering | May 5, 2022 | Image-text matchingLanguage Modeling | CodeCode Available | 1 | 5 |
| BiCro: Noisy Correspondence Rectification for Multi-modality Data via Bi-directional Cross-modal Similarity Consistency | Mar 22, 2023 | Cross-modal retrieval with noisy correspondenceImage-text matching | CodeCode Available | 1 | 5 |
| BrainCLIP: Bridging Brain and Visual-Linguistic Representation Via CLIP for Generic Natural Visual Stimulus Decoding | Feb 25, 2023 | Brain DecodingImage Generation | CodeCode Available | 1 | 5 |
| Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching | Apr 28, 2024 | Contrastive LearningImage-text matching | CodeCode Available | 1 | 5 |
| A Comparison of Supervised Learning to Match Methods for Product Search | Jul 20, 2020 | ARCAttribute | CodeCode Available | 1 | 5 |
| Identifying Machine-Paraphrased Plagiarism | Mar 22, 2021 | ArticlesText Matching | CodeCode Available | 1 | 5 |
| MedICaT: A Dataset of Medical Images, Captions, and Textual References | Oct 12, 2020 | document understandingImage-text matching | CodeCode Available | 1 | 5 |
| Improved Probabilistic Image-Text Representations | May 29, 2023 | Data AugmentationImage-text matching | CodeCode Available | 1 | 5 |
| Deep Multimodal Neural Architecture Search | Apr 25, 2020 | DecoderImage-text matching | CodeCode Available | 1 | 5 |
| RadCLIP: Enhancing Radiologic Image Analysis through Contrastive Language-Image Pre-training | Mar 15, 2024 | Diagnosticimage-classification | CodeCode Available | 1 | 5 |
| Multimodal Image-Text Matching Improves Retrieval-based Chest X-Ray Report Generation | Mar 29, 2023 | Image CaptioningImage-text matching | CodeCode Available | 1 | 5 |
| Adaptive Offline Quintuplet Loss for Image-Text Matching | Mar 7, 2020 | Image-text matchingText Matching | CodeCode Available | 1 | 5 |
| DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting | Dec 2, 2021 | Image-text matchingInstance Segmentation | CodeCode Available | 1 | 5 |
| KETM:A Knowledge-Enhanced Text Matching method | Aug 11, 2023 | Common Sense ReasoningQuestion Answering | CodeCode Available | 1 | 5 |
| Parts2Words: Learning Joint Embedding of Point Clouds and Texts by Bidirectional Matching between Parts and Words | Jul 5, 2021 | RetrievalText Matching | CodeCode Available | 1 | 5 |
| DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis | Aug 13, 2020 | Image GenerationText Matching | CodeCode Available | 1 | 5 |
| CLIP is Strong Enough to Fight Back: Test-time Counterattacks towards Zero-shot Adversarial Robustness of CLIP | Mar 5, 2025 | Adversarial RobustnessImage-text matching | CodeCode Available | 1 | 5 |
| Lattice CNNs for Matching Based Chinese Question Answering | Feb 25, 2019 | DiversityQuestion Answering | CodeCode Available | 1 | 5 |
| A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval | Jun 4, 2021 | Graph MatchingImage Retrieval | CodeCode Available | 1 | 5 |
| Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners | May 18, 2023 | Image GenerationImage-text matching | CodeCode Available | 1 | 5 |
| CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation | Feb 27, 2025 | Image-text matchingObject | CodeCode Available | 1 | 5 |
| Teach CLIP to Develop a Number Sense for Ordinal Regression | Aug 7, 2024 | regressionText Matching | CodeCode Available | 1 | 5 |
| 3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation | Aug 31, 2023 | NavigateReferring Expression | CodeCode Available | 1 | 5 |
| A Dual-way Enhanced Framework from Text Matching Point of View for Multimodal Entity Linking | Dec 19, 2023 | Entity LinkingText Matching | CodeCode Available | 0 | 5 |