| Selectively Hard Negative Mining for Alleviating Gradient Vanishing in Image-Text Matching | Mar 1, 2023 | Image-text matchingText Matching | —Unverified | 0 |
| BrainCLIP: Bridging Brain and Visual-Linguistic Representation Via CLIP for Generic Natural Visual Stimulus Decoding | Feb 25, 2023 | Brain DecodingImage Generation | CodeCode Available | 1 |
| Co-Driven Recognition of Semantic Consistency via the Fusion of Transformer and HowNet Sememes Knowledge | Feb 21, 2023 | Paraphrase IdentificationSentence | CodeCode Available | 0 |
| Unified Vision-Language Representation Modeling for E-Commerce Same-Style Products Retrieval | Feb 10, 2023 | AttributeLanguage Modeling | —Unverified | 0 |
| Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text Retrieval | Jan 30, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Improving Zero-Shot Action Recognition using Human Instruction with Text Description | Jan 21, 2023 | Action RecognitionSentence | —Unverified | 0 |
| Weakly Supervised Referring Image Segmentation with Intra-Chunk and Inter-Chunk Consistency | Jan 1, 2023 | Image SegmentationImage-text matching | —Unverified | 0 |
| VL-Match: Enhancing Vision-Language Pretraining with Token-Level and Instance-Level Matching | Jan 1, 2023 | Image-text matchingImage-text Retrieval | —Unverified | 0 |
| ShapeScaffolder: Structure-Aware 3D Shape Generation from Text | Jan 1, 2023 | 3D Shape GenerationText Matching | —Unverified | 0 |
| Fine-Grained Image-Text Matching by Cross-Modal Hard Aligning Network | Jan 1, 2023 | Image-text matchingRetrieval | CodeCode Available | 1 |
| RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension | Jan 1, 2023 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 |
| Learning Semantic Relationship Among Instances for Image-Text Matching | Jan 1, 2023 | Cross-Modal RetrievalImage Retrieval | CodeCode Available | 1 |
| Multimodal Matching-aware Co-attention Networks with Mutual Knowledge Distillation for Fake News Detection | Dec 12, 2022 | Fake News DetectionImage-text matching | —Unverified | 0 |
| Uniform Masking Prevails in Vision-Language Pretraining | Dec 10, 2022 | Image-text matchingLanguage Modeling | —Unverified | 0 |
| SimVTP: Simple Video Text Pre-training with Masked Autoencoders | Dec 7, 2022 | Contrastive Learningcross-modal alignment | CodeCode Available | 0 |
| ComCLIP: Training-Free Compositional Image and Text Matching | Nov 25, 2022 | Image-text matchingImage-text Retrieval | CodeCode Available | 1 |
| Self-supervised vision-language pretraining for Medical visual question answering | Nov 24, 2022 | Contrastive LearningImage-text matching | CodeCode Available | 1 |
| DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting | Nov 19, 2022 | DecoderScene Text Detection | CodeCode Available | 2 |
| Zero-Shot Text Matching for Automated Auditing using Sentence Transformers | Oct 28, 2022 | Information RetrievalQuestion Answering | —Unverified | 0 |
| UPainting: Unified Text-to-Image Diffusion Generation with Cross-modal Guidance | Oct 28, 2022 | Image GenerationImage-text matching | —Unverified | 0 |
| Dissecting Deep Metric Learning Losses for Image-Text Retrieval | Oct 21, 2022 | Cross-Modal RetrievalImage-text matching | CodeCode Available | 0 |
| Do Vision-and-Language Transformers Learn Grounded Predicate-Noun Dependencies? | Oct 21, 2022 | Image-text matchingLanguage Modeling | CodeCode Available | 0 |
| Law Article-Enhanced Legal Case Matching: a Causal Learning Approach | Oct 20, 2022 | ArticlesSemantic Text Matching | CodeCode Available | 0 |
| MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model | Oct 11, 2022 | Contrastive LearningImage-text matching | CodeCode Available | 1 |
| Using Interventions to Improve Out-of-Distribution Generalization of Text-Matching Recommendation Systems | Oct 7, 2022 | Language ModellingOut-of-Distribution Generalization | —Unverified | 0 |
| Adaptive Feature Discrimination and Denoising for Asymmetric Text Matching | Oct 1, 2022 | DenoisingText Matching | —Unverified | 0 |
| One-Shot Doc Snippet Detection: Powering Search in Document Beyond Text | Sep 12, 2022 | document understandingobject-detection | —Unverified | 0 |
| Random Text Perturbations Work, but not Always | Sep 2, 2022 | ClassificationData Augmentation | CodeCode Available | 0 |
| GRIT-VLP: Grouped Mini-batch Sampling for Efficient Vision and Language Pre-training | Aug 8, 2022 | Image-text matchingLanguage Modeling | CodeCode Available | 1 |
| ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval | Jul 29, 2022 | Cross-Modal RetrievalImage-text matching | CodeCode Available | 0 |
| Zero-Shot Video Captioning with Evolving Pseudo-Tokens | Jul 22, 2022 | Image CaptioningImage-text matching | CodeCode Available | 1 |
| Don't Stop Learning: Towards Continual Learning for the CLIP Model | Jul 19, 2022 | Continual LearningImage-text matching | —Unverified | 0 |
| SummScore: A Comprehensive Evaluation Metric for Summary Quality Based on Cross-Encoder | Jul 11, 2022 | DiversityText Matching | —Unverified | 0 |
| Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer | Jul 5, 2022 | Image-text matchingKnowledge Distillation | CodeCode Available | 1 |
| Contrastive Cross-Modal Knowledge Sharing Pre-training for Vision-Language Representation Learning and Retrieval | Jul 2, 2022 | Contrastive LearningCross-Modal Retrieval | —Unverified | 0 |
| A Dense Representation Framework for Lexical and Semantic Matching | Jun 20, 2022 | RetrievalSemantic Text Matching | CodeCode Available | 1 |
| What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text Inputs | Jun 19, 2022 | BenchmarkingImage Captioning | CodeCode Available | 1 |
| Delving into the Openness of CLIP | Jun 4, 2022 | image-classificationImage Classification | CodeCode Available | 0 |
| GR-GAN: Gradual Refinement Text-to-image Generation | May 23, 2022 | Generative Adversarial NetworkImage Generation | CodeCode Available | 0 |
| TextMatcher: Cross-Attentional Neural Network to Compare Image and Text | May 11, 2022 | Text Matching | —Unverified | 0 |
| CCMB: A Large-scale Chinese Cross-modal Benchmark | May 8, 2022 | image-classificationImage Classification | CodeCode Available | 1 |
| HumanAL: Calibrating Human Matching Beyond a Single Task | May 6, 2022 | Text Matching | —Unverified | 0 |
| Language Models Can See: Plugging Visual Controls in Text Generation | May 5, 2022 | Image CaptioningImage-text matching | CodeCode Available | 2 |
| Declaration-based Prompt Tuning for Visual Question Answering | May 5, 2022 | Image-text matchingLanguage Modeling | CodeCode Available | 1 |
| ClusterFormer: Neural Clustering Attention for Efficient and Effective Transformer | May 1, 2022 | ClusteringMachine Translation | —Unverified | 0 |
| Adaptable Text Matching via Meta-Weight Regulator | Apr 27, 2022 | Meta-LearningNatural Language Inference | —Unverified | 0 |
| Uncertainty-based Cross-Modal Retrieval with Probabilistic Representations | Apr 20, 2022 | Cross-Modal RetrievalImage Retrieval | —Unverified | 0 |
| Modality-Balanced Embedding for Video Retrieval | Apr 18, 2022 | RetrievalText Matching | —Unverified | 0 |
| No Token Left Behind: Explainability-Aided Image Classification and Generation | Apr 11, 2022 | image-classificationImage Classification | CodeCode Available | 1 |
| ECCV Caption: Correcting False Negatives by Collecting Machine-and-Human-verified Image-Caption Associations for MS-COCO | Apr 7, 2022 | Image-text matchingText Matching | CodeCode Available | 1 |