| GRIT-VLP: Grouped Mini-batch Sampling for Efficient Vision and Language Pre-training | Aug 8, 2022 | Image-text matchingLanguage Modeling | CodeCode Available | 1 |
| Zero-Shot Video Captioning with Evolving Pseudo-Tokens | Jul 22, 2022 | Image CaptioningImage-text matching | CodeCode Available | 1 |
| Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer | Jul 5, 2022 | Image-text matchingKnowledge Distillation | CodeCode Available | 1 |
| A Dense Representation Framework for Lexical and Semantic Matching | Jun 20, 2022 | RetrievalSemantic Text Matching | CodeCode Available | 1 |
| What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text Inputs | Jun 19, 2022 | BenchmarkingImage Captioning | CodeCode Available | 1 |
| CCMB: A Large-scale Chinese Cross-modal Benchmark | May 8, 2022 | image-classificationImage Classification | CodeCode Available | 1 |
| Declaration-based Prompt Tuning for Visual Question Answering | May 5, 2022 | Image-text matchingLanguage Modeling | CodeCode Available | 1 |
| No Token Left Behind: Explainability-Aided Image Classification and Generation | Apr 11, 2022 | image-classificationImage Classification | CodeCode Available | 1 |
| ECCV Caption: Correcting False Negatives by Collecting Machine-and-Human-verified Image-Caption Associations for MS-COCO | Apr 7, 2022 | Image-text matchingText Matching | CodeCode Available | 1 |
| MVPTR: Multi-Level Semantic Alignment for Vision-Language Pre-Training via Multi-Stage Learning | Jan 29, 2022 | Image-text matchingLanguage Modeling | CodeCode Available | 1 |
| Negative-Aware Attention Framework for Image-Text Matching | Jan 1, 2022 | Image-text matchingText Matching | CodeCode Available | 1 |
| DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting | Dec 2, 2021 | Image-text matchingInstance Segmentation | CodeCode Available | 1 |
| Learning with Noisy Correspondence for Cross-modal Matching | Dec 1, 2021 | Cross-Modal RetrievalCross-modal retrieval with noisy correspondence | CodeCode Available | 1 |
| Object-aware Video-language Pre-training for Retrieval | Dec 1, 2021 | ObjectRetrieval | CodeCode Available | 1 |
| Video and Text Matching with Conditioned Embeddings | Oct 21, 2021 | Machine TranslationSentence | CodeCode Available | 1 |
| ActionCLIP: A New Paradigm for Video Action Recognition | Sep 17, 2021 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| HANet: Hierarchical Alignment Networks for Video-Text Retrieval | Jul 26, 2021 | RetrievalText Matching | CodeCode Available | 1 |
| Parts2Words: Learning Joint Embedding of Point Clouds and Texts by Bidirectional Matching between Parts and Words | Jul 5, 2021 | RetrievalText Matching | CodeCode Available | 1 |
| A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval | Jun 4, 2021 | Graph MatchingImage Retrieval | CodeCode Available | 1 |
| Identifying Machine-Paraphrased Plagiarism | Mar 22, 2021 | ArticlesText Matching | CodeCode Available | 1 |
| LET: Linguistic Knowledge Enhanced Graph Transformer for Chinese Short Text Matching | Feb 25, 2021 | Text Matching | CodeCode Available | 1 |
| Match-Ignition: Plugging PageRank into Transformer for Long-form Text Matching | Jan 16, 2021 | Community Question AnsweringForm | CodeCode Available | 1 |
| Similarity Reasoning and Filtration for Image-Text Matching | Jan 5, 2021 | Cross-Modal RetrievalImage Retrieval | CodeCode Available | 1 |
| Learning Dual Semantic Relations with Graph Attention for Image-Text Matching | Oct 22, 2020 | Cross-Modal RetrievalGraph Attention | CodeCode Available | 1 |
| MedICaT: A Dataset of Medical Images, Captions, and Textual References | Oct 12, 2020 | document understandingImage-text matching | CodeCode Available | 1 |
| Universal Weighting Metric Learning for Cross-Modal Matching | Oct 7, 2020 | Image-text matchingMetric Learning | CodeCode Available | 1 |
| MultiCQA: Zero-Shot Transfer of Self-Supervised Text Matching Models on a Massive Scale | Oct 2, 2020 | Answer SelectionCommunity Question Answering | CodeCode Available | 1 |
| DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis | Aug 13, 2020 | Image GenerationText Matching | CodeCode Available | 1 |
| A Comparison of Supervised Learning to Match Methods for Product Search | Jul 20, 2020 | ARCAttribute | CodeCode Available | 1 |
| Consensus-Aware Visual-Semantic Embedding for Image-Text Matching | Jul 17, 2020 | Image CaptioningImage-text matching | CodeCode Available | 1 |
| Rationalizing Text Matching: Learning Sparse Alignments via Optimal Transport | May 27, 2020 | Text Matching | CodeCode Available | 1 |
| Machine Reading Comprehension: The Role of Contextualized Language Models and Beyond | May 13, 2020 | DecoderMachine Reading Comprehension | CodeCode Available | 1 |
| Deep Multimodal Neural Architecture Search | Apr 25, 2020 | DecoderImage-text matching | CodeCode Available | 1 |
| Transformer Reasoning Network for Image-Text Matching and Retrieval | Apr 20, 2020 | Image RetrievalImage-text matching | CodeCode Available | 1 |
| Extractive Summarization as Text Matching | Apr 19, 2020 | Document SummarizationExtractive Summarization | CodeCode Available | 1 |
| Text-Guided Neural Image Inpainting | Apr 7, 2020 | DescriptiveImage Generation | CodeCode Available | 1 |
| Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers | Apr 2, 2020 | Image-text matchingImage-text Retrieval | CodeCode Available | 1 |
| Graph Structured Network for Image-Text Matching | Apr 1, 2020 | AttributeCross-Modal Retrieval | CodeCode Available | 1 |
| More Grounded Image Captioning by Distilling Image-Text Matching Model | Apr 1, 2020 | Image CaptioningImage-text matching | CodeCode Available | 1 |
| Keyword-Attentive Deep Semantic Matching | Mar 11, 2020 | RetrievalText Matching | CodeCode Available | 1 |
| Adaptive Offline Quintuplet Loss for Image-Text Matching | Mar 7, 2020 | Image-text matchingText Matching | CodeCode Available | 1 |
| Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning | Mar 1, 2020 | Cross-Modal RetrievalRetrieval | CodeCode Available | 1 |
| Knowledge Guided Text Retrieval and Reading for Open Domain Question Answering | Nov 10, 2019 | Natural QuestionsOpen-Domain Question Answering | CodeCode Available | 1 |
| UNITER: UNiversal Image-TExt Representation Learning | Sep 25, 2019 | Image-text matchingImage-text Retrieval | CodeCode Available | 1 |
| Visual Semantic Reasoning for Image-Text Matching | Sep 6, 2019 | Cross-Modal RetrievalImage Retrieval | CodeCode Available | 1 |
| Lattice CNNs for Matching Based Chinese Question Answering | Feb 25, 2019 | DiversityQuestion Answering | CodeCode Available | 1 |
| Stacked Cross Attention for Image-Text Matching | Mar 21, 2018 | Cross-Modal RetrievalImage Retrieval | CodeCode Available | 1 |
| AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks | Nov 28, 2017 | Generative Adversarial NetworkImage Generation | CodeCode Available | 1 |
| Text Matching as Image Recognition | Feb 20, 2016 | Ad-Hoc Information RetrievalText Matching | CodeCode Available | 1 |
| TNG-CLIP:Training-Time Negation Data Generation for Negation Awareness of CLIP | May 24, 2025 | Image CaptioningImage Generation | —Unverified | 0 |