| Uncertainty-based Cross-Modal Retrieval with Probabilistic Representations | Apr 20, 2022 | Cross-Modal RetrievalImage Retrieval | —Unverified | 0 |
| COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval | Apr 15, 2022 | Contrastive LearningCross-Modal Retrieval | —Unverified | 0 |
| Characterizing and Understanding the Behavior of Quantized Models for Reliable Deployment | Apr 8, 2022 | Image to textLanguage Modeling | CodeCode Available | 0 |
| Two-stream Hierarchical Similarity Reasoning for Image-text Matching | Mar 10, 2022 | Image-text matchingImage to text | —Unverified | 0 |
| A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering | Jan 14, 2022 | Generative Question AnsweringImage to text | —Unverified | 0 |
| EI-CLIP: Entity-Aware Interventional Contrastive Learning for E-Commerce Cross-Modal Retrieval | Jan 1, 2022 | Causal InferenceContrastive Learning | —Unverified | 0 |
| Transform-Retrieve-Generate: Natural Language-Centric Outside-Knowledge Visual Question Answering | Jan 1, 2022 | Generative Question AnsweringImage to text | —Unverified | 0 |
| ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation | Dec 31, 2021 | Image CaptioningImage Generation | CodeCode Available | 1 |
| Distilled Dual-Encoder Model for Vision-Language Understanding | Dec 16, 2021 | Image to textmodel | CodeCode Available | 1 |
| Self-Supervised Image-to-Text and Text-to-Image Synthesis | Dec 9, 2021 | Image GenerationImage to text | CodeCode Available | 0 |
| Exploration into Translation-Equivariant Image Quantization | Dec 1, 2021 | Image GenerationImage to text | CodeCode Available | 0 |
| ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic | Nov 29, 2021 | Contrastive LearningDescriptive | CodeCode Available | 1 |
| Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages | Nov 24, 2021 | DecoderImage to text | —Unverified | 0 |
| L-Verse: Bidirectional Generation Between Image and Text | Nov 22, 2021 | Image CaptioningImage Generation | CodeCode Available | 1 |
| Unifying Multimodal Transformer for Bi-directional Image and Text Generation | Oct 19, 2021 | Image GenerationImage to text | CodeCode Available | 1 |
| Contrastive Learning of Visual-Semantic Embeddings | Oct 17, 2021 | Contrastive Learningimage-classification | —Unverified | 0 |
| Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval | May 16, 2021 | Graph GenerationImage Captioning | —Unverified | 0 |
| Concadia: Towards Image-Based Text Generation with a Purpose | Apr 16, 2021 | Image CaptioningImage to text | CodeCode Available | 1 |
| Knowledge driven Description Synthesis for Floor Plan Interpretation | Mar 15, 2021 | Caption GenerationDescriptive | —Unverified | 0 |
| Progressive Transformer-Based Generation of Radiology Reports | Feb 19, 2021 | Image to textText Generation | CodeCode Available | 1 |
| Improving Factual Completeness and Consistency of Image-to-Text Radiology Report Generation | Oct 20, 2020 | Image to textNatural Language Inference | CodeCode Available | 1 |
| Hierarchical Gumbel Attention Network for Text-based Person Search | Oct 10, 2020 | Image RetrievalImage to text | —Unverified | 0 |
| Cross-Modal Alignment with Mixture Experts Neural Network for Intral-City Retail Recommendation | Sep 17, 2020 | cross-modal alignmentImage to text | —Unverified | 0 |
| Development of a New Image-to-text Conversion System for Pashto, Farsi and Traditional Chinese | May 8, 2020 | Image to textOptical Character Recognition (OCR) | —Unverified | 0 |
| Multimodal Intelligence: Representation Learning, Information Fusion, and Applications | Nov 10, 2019 | Caption GenerationImage Generation | —Unverified | 0 |
| Illegible Text to Readable Text: An Image-to-Image Transformation using Conditional Sliced Wasserstein Adversarial Networks | Oct 11, 2019 | Generative Adversarial NetworkImage-to-Image Translation | —Unverified | 0 |
| Aligning Multilingual Word Embeddings for Cross-Modal Retrieval Task | Oct 8, 2019 | Cross-Modal RetrievalImage to text | CodeCode Available | 0 |
| From Image to Text in Sentiment Analysis via Regression and Deep Learning | Sep 1, 2019 | Image to textregression | —Unverified | 0 |
| Knowledge Aware Semantic Concept Expansion for Image-Text Matching | Aug 10, 2019 | Common Sense ReasoningContent-Based Image Retrieval | —Unverified | 0 |
| MirrorGAN: Learning Text-to-image Generation by Redescription | Mar 14, 2019 | DiversityImage Generation | CodeCode Available | 0 |
| Canonical Correlation Analysis for Misaligned Satellite Image Change Detection | Dec 21, 2018 | Action RecognitionChange Detection | —Unverified | 0 |
| Doc2Im: document to image conversion through self-attentive embedding | Nov 8, 2018 | Document To Image Conversiondocument understanding | —Unverified | 0 |
| SpatialVOC2K: A Multilingual Dataset of Images with Annotations and Features for Spatial Relations between Objects | Nov 1, 2018 | Image to textObject | CodeCode Available | 0 |
| EmojiGAN: learning emojis distributions with a generative model | Oct 1, 2018 | Image CaptioningImage to text | —Unverified | 0 |
| Text-to-Image-to-Text Translation using Cycle Consistent Adversarial Networks | Aug 14, 2018 | Image to textSentence | CodeCode Available | 0 |
| Deductron -- A Recurrent Neural Network | Jun 23, 2018 | Image to textOptical Character Recognition (OCR) | —Unverified | 0 |
| Using Inter-Sentence Diverse Beam Search to Reduce Redundancy in Visual Storytelling | May 30, 2018 | Image to textSentence | —Unverified | 0 |
| Turbo Learning for Captionbot and Drawingbot | May 21, 2018 | Image CaptioningImage Generation | —Unverified | 0 |
| Face2Text: Collecting an Annotated Image Description Corpus for the Generation of Rich Face Descriptions | Mar 10, 2018 | Image DescriptionImage to text | CodeCode Available | 0 |
| Synthesizing Novel Pairs of Image and Text | Dec 18, 2017 | Image to text | —Unverified | 0 |
| From Image to Text Classification: A Novel Approach based on Clustering Word Embeddings | Jul 25, 2017 | ClusteringGeneral Classification | —Unverified | 0 |
| Better Text Understanding Through Image-To-Text Transfer | May 23, 2017 | Image to text | —Unverified | 0 |
| I2T2I: Learning Text to Image Synthesis with Textual Data Augmentation | Mar 20, 2017 | Caption GenerationData Augmentation | —Unverified | 0 |
| A Gentle Tutorial of Recurrent Neural Network with Error Backpropagation | Oct 8, 2016 | Handwriting RecognitionImage to text | CodeCode Available | 0 |
| Learning Deep Structure-Preserving Image-Text Embeddings | Nov 19, 2015 | Image RetrievalImage to text | —Unverified | 0 |
| Effective Use of Word Order for Text Categorization with Convolutional Neural Networks | Dec 1, 2014 | General ClassificationImage to text | CodeCode Available | 0 |