| Brain Captioning: Decoding human brain activity into images and text | May 19, 2023 | Brain DecodingDepth Estimation | CodeCode Available | 1 | 5 |
| Distilled Dual-Encoder Model for Vision-Language Understanding | Dec 16, 2021 | Image to textmodel | CodeCode Available | 1 | 5 |
| FETA: Towards Specializing Foundation Models for Expert Task Applications | Sep 8, 2022 | Domain GeneralizationFew-Shot Learning | CodeCode Available | 1 | 5 |
| Linearly Mapping from Image to Text Space | Sep 30, 2022 | Image CaptioningImage to text | CodeCode Available | 1 | 5 |
| Can MLLMs Perform Text-to-Image In-Context Learning? | Feb 2, 2024 | Image GenerationImage to text | CodeCode Available | 1 | 5 |
| Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models | Nov 9, 2022 | Image GenerationImage to text | CodeCode Available | 1 | 5 |
| DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles | Mar 5, 2025 | Domain AdaptationImage to text | CodeCode Available | 1 | 5 |
| See or Guess: Counterfactually Regularized Image Captioning | Aug 29, 2024 | Causal Inferencecounterfactual | CodeCode Available | 1 | 5 |
| UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation | Aug 21, 2024 | Image GenerationImage Retrieval | CodeCode Available | 1 | 5 |
| Pragmatic Radiology Report Generation | Nov 28, 2023 | Image to text | CodeCode Available | 0 | 5 |