| Instruction Tuning-free Visual Token Complement for Multimodal LLMs | Aug 9, 2024 | Image GenerationImage to text | —Unverified | 0 | 0 |
| Development of a New Image-to-text Conversion System for Pashto, Farsi and Traditional Chinese | May 8, 2020 | Image to textOptical Character Recognition (OCR) | —Unverified | 0 | 0 |
| Improving the Factual Correctness of Radiology Report Generation with Semantic Rewards | Oct 21, 2022 | Image to textnamed-entity-recognition | —Unverified | 0 | 0 |
| Improving Table Structure Recognition with Visual-Alignment Sequential Coordinate Modeling | Mar 13, 2023 | DecoderImage to text | —Unverified | 0 | 0 |
| Deductron -- A Recurrent Neural Network | Jun 23, 2018 | Image to textOptical Character Recognition (OCR) | —Unverified | 0 | 0 |
| Improving Medical Visual Representation Learning with Pathological-level Cross-Modal Alignment and Correlation Exploration | Jun 12, 2025 | cross-modal alignmentImage to text | —Unverified | 0 | 0 |
| Interpreting Vision and Language Generative Models with Semantic Visual Priors | Apr 28, 2023 | Image to text | —Unverified | 0 | 0 |
| Is Cross-modal Information Retrieval Possible without Training? | Apr 20, 2023 | Contrastive LearningCross-Modal Information Retrieval | —Unverified | 0 | 0 |
| DART: Disease-aware Image-Text Alignment and Self-correcting Re-alignment for Trustworthy Radiology Report Generation | Apr 16, 2025 | Contrastive LearningImage to text | —Unverified | 0 | 0 |
| An Online Learning Approach to Prompt-based Selection of Generative Models | Oct 17, 2024 | Image to text | —Unverified | 0 | 0 |