| LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer Text | Mar 25, 2025 | Cross-Modal RetrievalHallucination | CodeCode Available | 1 | 5 |
| MAGVLT: Masked Generative Vision-and-Language Transformer | Mar 21, 2023 | Image CaptioningImage Generation | CodeCode Available | 1 | 5 |
| Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment | Feb 2, 2023 | AttributeFew-Shot Image Classification | CodeCode Available | 1 | 5 |
| What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text Inputs | Jun 19, 2022 | BenchmarkingImage Captioning | CodeCode Available | 1 | 5 |
| Linearly Mapping from Image to Text Space | Sep 30, 2022 | Image CaptioningImage to text | CodeCode Available | 1 | 5 |
| Improving Factual Completeness and Consistency of Image-to-Text Radiology Report Generation | Oct 20, 2020 | Image to textNatural Language Inference | CodeCode Available | 1 | 5 |
| Improving Image Restoration through Removing Degradations in Textual Representations | Dec 28, 2023 | DeblurringDenoising | CodeCode Available | 1 | 5 |
| Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts | Feb 17, 2023 | Image RetrievalImage-text Classification | CodeCode Available | 1 | 5 |
| Bootstrapping Vision-Language Learning with Decoupled Language Pre-training | Jul 13, 2023 | Image to text | CodeCode Available | 1 | 5 |
| UniCMs: A Unified Consistency Model For Efficient Multimodal Generation and Understanding | Feb 8, 2025 | DenoisingImage Generation | CodeCode Available | 1 | 5 |
| Brain Captioning: Decoding human brain activity into images and text | May 19, 2023 | Brain DecodingDepth Estimation | CodeCode Available | 1 | 5 |
| Distilled Dual-Encoder Model for Vision-Language Understanding | Dec 16, 2021 | Image to textmodel | CodeCode Available | 1 | 5 |
| LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation? | Apr 16, 2024 | Image CaptioningImage Generation | CodeCode Available | 1 | 5 |
| Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation | Oct 20, 2022 | DecoderImage Captioning | CodeCode Available | 1 | 5 |
| Can MLLMs Perform Text-to-Image In-Context Learning? | Feb 2, 2024 | Image GenerationImage to text | CodeCode Available | 1 | 5 |
| What You See is What You Read? Improving Text-Image Alignment Evaluation | May 17, 2023 | Image GenerationImage to text | CodeCode Available | 1 | 5 |
| DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles | Mar 5, 2025 | Domain AdaptationImage to text | CodeCode Available | 1 | 5 |
| Language-Oriented Semantic Latent Representation for Image Transmission | May 16, 2024 | Image to textSemantic Communication | CodeCode Available | 1 | 5 |
| Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models | Nov 27, 2023 | Cross-Modal RetrievalImage Generation | CodeCode Available | 1 | 5 |
| Text-to-Image-to-Text Translation using Cycle Consistent Adversarial Networks | Aug 14, 2018 | Image to textSentence | CodeCode Available | 0 | 5 |
| BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval | Jun 14, 2024 | Image RetrievalImage to text | CodeCode Available | 0 | 5 |
| Towards a text-based quantitative and explainable histopathology image analysis | Jul 10, 2024 | image-classificationImage Classification | CodeCode Available | 0 | 5 |
| Exploration into Translation-Equivariant Image Quantization | Dec 1, 2021 | Image GenerationImage to text | CodeCode Available | 0 | 5 |
| UniMoCo: Unified Modality Completion for Robust Multi-Modal Embeddings | May 17, 2025 | Image to textInformation Retrieval | CodeCode Available | 0 | 5 |
| Self-Supervised Image-to-Text and Text-to-Image Synthesis | Dec 9, 2021 | Image GenerationImage to text | CodeCode Available | 0 | 5 |
| SpatialVOC2K: A Multilingual Dataset of Images with Annotations and Features for Spatial Relations between Objects | Nov 1, 2018 | Image to textObject | CodeCode Available | 0 | 5 |
| A Data-Driven Guided Decoding Mechanism for Diagnostic Captioning | Jun 20, 2024 | DiagnosticImage to text | CodeCode Available | 0 | 5 |
| Survey on Abstractive Text Summarization: Dataset, Models, and Metrics | Dec 22, 2024 | Abstractive Text SummarizationGeneral Knowledge | CodeCode Available | 0 | 5 |
| Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs) | Oct 25, 2024 | AttributeImage to text | CodeCode Available | 0 | 5 |
| Delving into the Openness of CLIP | Jun 4, 2022 | image-classificationImage Classification | CodeCode Available | 0 | 5 |
| RoCOCO: Robustness Benchmark of MS-COCO to Stress-test Image-Text Matching Models | Apr 21, 2023 | Cross-Modal RetrievalImage-text matching | CodeCode Available | 0 | 5 |
| Aligning Multilingual Word Embeddings for Cross-Modal Retrieval Task | Oct 8, 2019 | Cross-Modal RetrievalImage to text | CodeCode Available | 0 | 5 |
| Real-world validation of a multimodal LLM-powered pipeline for High-Accuracy Clinical Trial Patient Matching leveraging EHR data | Mar 19, 2025 | Image to text | CodeCode Available | 0 | 5 |
| CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs | Jan 5, 2024 | Image ComprehensionImage to text | CodeCode Available | 0 | 5 |
| Probing Multimodal Large Language Models for Global and Local Semantic Representations | Feb 27, 2024 | Image to textobject-detection | CodeCode Available | 0 | 5 |
| Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored Search | Sep 28, 2023 | cross-modal alignmentCross-Modal Retrieval | CodeCode Available | 0 | 5 |
| Pragmatic Radiology Report Generation | Nov 28, 2023 | Image to text | CodeCode Available | 0 | 5 |
| PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval | Mar 20, 2025 | Contrastive LearningCross-Modal Retrieval | CodeCode Available | 0 | 5 |
| Adaptively Clustering Neighbor Elements for Image-Text Generation | Jan 5, 2023 | ClusteringDecoder | CodeCode Available | 0 | 5 |
| Benchmarking Vision-Language Contrastive Methods for Medical Representation Learning | Jun 11, 2024 | BenchmarkingContrastive Learning | CodeCode Available | 0 | 5 |
| CLIP-FSAC++: Few-Shot Anomaly Classification with Anomaly Descriptor Based on CLIP | Dec 5, 2024 | Anomaly ClassificationAnomaly Detection | CodeCode Available | 0 | 5 |
| GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language Models | Jul 30, 2024 | Image to textImage-to-Text Retrieval | CodeCode Available | 0 | 5 |
| Face2Text: Collecting an Annotated Image Description Corpus for the Generation of Rich Face Descriptions | Mar 10, 2018 | Image DescriptionImage to text | CodeCode Available | 0 | 5 |
| MultiQG-TI: Towards Question Generation from Multi-modal Sources | Jul 7, 2023 | Image to textOptical Character Recognition | CodeCode Available | 0 | 5 |
| CLIP-based Synergistic Knowledge Transfer for Text-based Person Retrieval | Sep 18, 2023 | Image to textPerson Retrieval | CodeCode Available | 0 | 5 |
| PromptHash:Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval | Jan 1, 2025 | Contrastive LearningImage Retrieval | CodeCode Available | 0 | 5 |
| MirrorGAN: Learning Text-to-image Generation by Redescription | Mar 14, 2019 | DiversityImage Generation | CodeCode Available | 0 | 5 |
| Multi-LLM Collaborative Caption Generation in Scientific Documents | Jan 5, 2025 | Caption GenerationImage to text | CodeCode Available | 0 | 5 |
| Characterizing and Understanding the Behavior of Quantized Models for Reliable Deployment | Apr 8, 2022 | Image to textLanguage Modeling | CodeCode Available | 0 | 5 |
| MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering | Dec 19, 2022 | Chart Question AnsweringData Summarization | CodeCode Available | 0 | 5 |