| Evaluating Text-to-Visual Generation with Image-to-Text Generation | Apr 1, 2024 | Image to textQuestion Answering | CodeCode Available | 3 |
| BIMCV-R: A Landmark Dataset for 3D CT Text-Image Retrieval | Mar 24, 2024 | DiagnosticImage Retrieval | —Unverified | 0 |
| Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation | Mar 14, 2024 | Image to textOptical Character Recognition (OCR) | —Unverified | 0 |
| ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes | Mar 7, 2024 | Image to textObject | CodeCode Available | 1 |
| MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant | Mar 7, 2024 | Clinical KnowledgeImage to text | —Unverified | 0 |
| CLIP the Bias: How Useful is Balancing Data in Multimodal Learning? | Mar 7, 2024 | Image to textImage-to-Text Retrieval | —Unverified | 0 |
| Enhancing Vision-Language Pre-training with Rich Supervisions | Mar 5, 2024 | Image to textTable Detection | —Unverified | 0 |
| Attention Guidance Mechanism for Handwritten Mathematical Expression Recognition | Mar 4, 2024 | Image to text | —Unverified | 0 |
| Probing Multimodal Large Language Models for Global and Local Semantic Representations | Feb 27, 2024 | Image to textobject-detection | CodeCode Available | 0 |
| A Unified Framework and Dataset for Assessing Societal Bias in Vision-Language Models | Feb 21, 2024 | BenchmarkingImage to text | —Unverified | 0 |