| Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with Coordinated Semantics | Oct 24, 2024 | Image to textImage-Variation | —Unverified | 0 |
| Improving Medical Visual Representation Learning with Pathological-level Cross-Modal Alignment and Correlation Exploration | Jun 12, 2025 | cross-modal alignmentImage to text | —Unverified | 0 |
| Improving Table Structure Recognition with Visual-Alignment Sequential Coordinate Modeling | Mar 13, 2023 | DecoderImage to text | —Unverified | 0 |
| Improving the Factual Correctness of Radiology Report Generation with Semantic Rewards | Oct 21, 2022 | Image to textnamed-entity-recognition | —Unverified | 0 |
| Towards Cross-modal Retrieval in Chinese Cultural Heritage Documents: Dataset and Solution | May 16, 2025 | Cross-Modal RetrievalImage to text | —Unverified | 0 |
| From Image to Text in Sentiment Analysis via Regression and Deep Learning | Sep 1, 2019 | Image to textregression | —Unverified | 0 |
| Interpreting Vision and Language Generative Models with Semantic Visual Priors | Apr 28, 2023 | Image to text | —Unverified | 0 |
| Is Cross-modal Information Retrieval Possible without Training? | Apr 20, 2023 | Contrastive LearningCross-Modal Information Retrieval | —Unverified | 0 |
| CoBIT: A Contrastive Bi-directional Image-Text Generation Model | Mar 23, 2023 | DecoderImage Generation | —Unverified | 0 |
| From Image to Text Classification: A Novel Approach based on Clustering Word Embeddings | Jul 25, 2017 | ClusteringGeneral Classification | —Unverified | 0 |