| Uncertainty-based Cross-Modal Retrieval with Probabilistic Representations | Apr 20, 2022 | Cross-Modal RetrievalImage Retrieval | —Unverified | 0 |
| COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval | Apr 15, 2022 | Contrastive LearningCross-Modal Retrieval | —Unverified | 0 |
| Characterizing and Understanding the Behavior of Quantized Models for Reliable Deployment | Apr 8, 2022 | Image to textLanguage Modeling | CodeCode Available | 0 |
| Two-stream Hierarchical Similarity Reasoning for Image-text Matching | Mar 10, 2022 | Image-text matchingImage to text | —Unverified | 0 |
| A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering | Jan 14, 2022 | Generative Question AnsweringImage to text | —Unverified | 0 |
| EI-CLIP: Entity-Aware Interventional Contrastive Learning for E-Commerce Cross-Modal Retrieval | Jan 1, 2022 | Causal InferenceContrastive Learning | —Unverified | 0 |
| Transform-Retrieve-Generate: Natural Language-Centric Outside-Knowledge Visual Question Answering | Jan 1, 2022 | Generative Question AnsweringImage to text | —Unverified | 0 |
| ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation | Dec 31, 2021 | Image CaptioningImage Generation | CodeCode Available | 1 |
| Distilled Dual-Encoder Model for Vision-Language Understanding | Dec 16, 2021 | Image to textmodel | CodeCode Available | 1 |
| Self-Supervised Image-to-Text and Text-to-Image Synthesis | Dec 9, 2021 | Image GenerationImage to text | CodeCode Available | 0 |
| Exploration into Translation-Equivariant Image Quantization | Dec 1, 2021 | Image GenerationImage to text | CodeCode Available | 0 |
| ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic | Nov 29, 2021 | Contrastive LearningDescriptive | CodeCode Available | 1 |
| Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages | Nov 24, 2021 | DecoderImage to text | —Unverified | 0 |
| L-Verse: Bidirectional Generation Between Image and Text | Nov 22, 2021 | Image CaptioningImage Generation | CodeCode Available | 1 |
| Unifying Multimodal Transformer for Bi-directional Image and Text Generation | Oct 19, 2021 | Image GenerationImage to text | CodeCode Available | 1 |
| Contrastive Learning of Visual-Semantic Embeddings | Oct 17, 2021 | Contrastive Learningimage-classification | —Unverified | 0 |
| Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval | May 16, 2021 | Graph GenerationImage Captioning | —Unverified | 0 |
| Concadia: Towards Image-Based Text Generation with a Purpose | Apr 16, 2021 | Image CaptioningImage to text | CodeCode Available | 1 |
| Knowledge driven Description Synthesis for Floor Plan Interpretation | Mar 15, 2021 | Caption GenerationDescriptive | —Unverified | 0 |
| Progressive Transformer-Based Generation of Radiology Reports | Feb 19, 2021 | Image to textText Generation | CodeCode Available | 1 |
| Improving Factual Completeness and Consistency of Image-to-Text Radiology Report Generation | Oct 20, 2020 | Image to textNatural Language Inference | CodeCode Available | 1 |
| Hierarchical Gumbel Attention Network for Text-based Person Search | Oct 10, 2020 | Image RetrievalImage to text | —Unverified | 0 |
| Cross-Modal Alignment with Mixture Experts Neural Network for Intral-City Retail Recommendation | Sep 17, 2020 | cross-modal alignmentImage to text | —Unverified | 0 |
| Development of a New Image-to-text Conversion System for Pashto, Farsi and Traditional Chinese | May 8, 2020 | Image to textOptical Character Recognition (OCR) | —Unverified | 0 |
| Multimodal Intelligence: Representation Learning, Information Fusion, and Applications | Nov 10, 2019 | Caption GenerationImage Generation | —Unverified | 0 |