| A Dense Representation Framework for Lexical and Semantic Matching | Jun 20, 2022 | RetrievalSemantic Text Matching | CodeCode Available | 1 |
| Text Matching as Image Recognition | Feb 20, 2016 | Ad-Hoc Information RetrievalText Matching | CodeCode Available | 1 |
| ColorSwap: A Color and Word Order Dataset for Multimodal Evaluation | Feb 7, 2024 | Image GenerationImage-text matching | CodeCode Available | 1 |
| Lattice CNNs for Matching Based Chinese Question Answering | Feb 25, 2019 | DiversityQuestion Answering | CodeCode Available | 1 |
| ActionCLIP: A New Paradigm for Video Action Recognition | Sep 17, 2021 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| ComCLIP: Training-Free Compositional Image and Text Matching | Nov 25, 2022 | Image-text matchingImage-text Retrieval | CodeCode Available | 1 |
| Composing Object Relations and Attributes for Image-Text Matching | Jun 17, 2024 | AttributeGraph Attention | CodeCode Available | 1 |
| UNITER: UNiversal Image-TExt Representation Learning | Sep 25, 2019 | Image-text matchingImage-text Retrieval | CodeCode Available | 1 |
| Consensus-Aware Visual-Semantic Embedding for Image-Text Matching | Jul 17, 2020 | Image CaptioningImage-text matching | CodeCode Available | 1 |
| Video and Text Matching with Conditioned Embeddings | Oct 21, 2021 | Machine TranslationSentence | CodeCode Available | 1 |
| LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation | May 18, 2023 | AttributeImage Generation | CodeCode Available | 1 |
| Visual Semantic Reasoning for Image-Text Matching | Sep 6, 2019 | Cross-Modal RetrievalImage Retrieval | CodeCode Available | 1 |
| Identifying Machine-Paraphrased Plagiarism | Mar 22, 2021 | ArticlesText Matching | CodeCode Available | 1 |
| Graph Structured Network for Image-Text Matching | Apr 1, 2020 | AttributeCross-Modal Retrieval | CodeCode Available | 1 |
| Advancing Visual Grounding with Scene Knowledge: Benchmark and Method | Jul 21, 2023 | Image-text matchingText Matching | CodeCode Available | 1 |
| GRIT-VLP: Grouped Mini-batch Sampling for Efficient Vision and Language Pre-training | Aug 8, 2022 | Image-text matchingLanguage Modeling | CodeCode Available | 1 |
| Image-text matching for large-scale book collections | Jul 29, 2024 | Image-text matchingOptical Character Recognition (OCR) | CodeCode Available | 1 |
| AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks | Nov 28, 2017 | Generative Adversarial NetworkImage Generation | CodeCode Available | 1 |
| HANet: Hierarchical Alignment Networks for Video-Text Retrieval | Jul 26, 2021 | RetrievalText Matching | CodeCode Available | 1 |
| Cross-modal Active Complementary Learning with Self-refining Correspondence | Oct 26, 2023 | Cross-modal retrieval with noisy correspondenceImage-text matching | CodeCode Available | 1 |
| Fine-Grained Image-Text Matching by Cross-Modal Hard Aligning Network | Jan 1, 2023 | Image-text matchingRetrieval | CodeCode Available | 1 |
| IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word Emphasis | Mar 2, 2025 | Image SegmentationImage-text matching | CodeCode Available | 1 |
| Extractive Summarization as Text Matching | Apr 19, 2020 | Document SummarizationExtractive Summarization | CodeCode Available | 1 |
| Knowledge Guided Text Retrieval and Reading for Open Domain Question Answering | Nov 10, 2019 | Natural QuestionsOpen-Domain Question Answering | CodeCode Available | 1 |
| Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning | Mar 1, 2020 | Cross-Modal RetrievalRetrieval | CodeCode Available | 1 |
| Learning Semantic Relationship Among Instances for Image-Text Matching | Jan 1, 2023 | Cross-Modal RetrievalImage Retrieval | CodeCode Available | 1 |
| Declaration-based Prompt Tuning for Visual Question Answering | May 5, 2022 | Image-text matchingLanguage Modeling | CodeCode Available | 1 |
| BiCro: Noisy Correspondence Rectification for Multi-modality Data via Bi-directional Cross-modal Similarity Consistency | Mar 22, 2023 | Cross-modal retrieval with noisy correspondenceImage-text matching | CodeCode Available | 1 |
| BrainCLIP: Bridging Brain and Visual-Linguistic Representation Via CLIP for Generic Natural Visual Stimulus Decoding | Feb 25, 2023 | Brain DecodingImage Generation | CodeCode Available | 1 |
| Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching | Apr 28, 2024 | Contrastive LearningImage-text matching | CodeCode Available | 1 |
| A Comparison of Supervised Learning to Match Methods for Product Search | Jul 20, 2020 | ARCAttribute | CodeCode Available | 1 |
| Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models | Jun 10, 2025 | Contrastive LearningImage-text matching | CodeCode Available | 1 |
| Improved Probabilistic Image-Text Representations | May 29, 2023 | Data AugmentationImage-text matching | CodeCode Available | 1 |
| MedICaT: A Dataset of Medical Images, Captions, and Textual References | Oct 12, 2020 | document understandingImage-text matching | CodeCode Available | 1 |
| Machine Reading Comprehension: The Role of Contextualized Language Models and Beyond | May 13, 2020 | DecoderMachine Reading Comprehension | CodeCode Available | 1 |
| More Grounded Image Captioning by Distilling Image-Text Matching Model | Apr 1, 2020 | Image CaptioningImage-text matching | CodeCode Available | 1 |
| RadCLIP: Enhancing Radiologic Image Analysis through Contrastive Language-Image Pre-training | Mar 15, 2024 | Diagnosticimage-classification | CodeCode Available | 1 |
| Adaptive Offline Quintuplet Loss for Image-Text Matching | Mar 7, 2020 | Image-text matchingText Matching | CodeCode Available | 1 |
| DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting | Dec 2, 2021 | Image-text matchingInstance Segmentation | CodeCode Available | 1 |
| Narrative Action Evaluation with Prompt-Guided Multimodal Interaction | Apr 22, 2024 | Action Quality Assessmentmultimodal interaction | CodeCode Available | 1 |
| No Token Left Behind: Explainability-Aided Image Classification and Generation | Apr 11, 2022 | image-classificationImage Classification | CodeCode Available | 1 |
| DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis | Aug 13, 2020 | Image GenerationText Matching | CodeCode Available | 1 |
| CLIP is Strong Enough to Fight Back: Test-time Counterattacks towards Zero-shot Adversarial Robustness of CLIP | Mar 5, 2025 | Adversarial RobustnessImage-text matching | CodeCode Available | 1 |
| Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer | Jul 5, 2022 | Image-text matchingKnowledge Distillation | CodeCode Available | 1 |
| A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval | Jun 4, 2021 | Graph MatchingImage Retrieval | CodeCode Available | 1 |
| Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners | May 18, 2023 | Image GenerationImage-text matching | CodeCode Available | 1 |
| CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation | Feb 27, 2025 | Image-text matchingObject | CodeCode Available | 1 |
| Revisiting Deep Audio-Text Retrieval Through the Lens of Transportation | May 16, 2024 | AudioCapsEvent Detection | CodeCode Available | 1 |
| 3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation | Aug 31, 2023 | NavigateReferring Expression | CodeCode Available | 1 |
| Content-Based Image Retrieval for Multi-Class Volumetric Radiology Images: A Benchmark Study | May 15, 2024 | Content-Based Image RetrievalImage Retrieval | —Unverified | 0 |