| CMC-Bench: Towards a New Paradigm of Visual Signal Compression | Jun 13, 2024 | Image CompressionImage to text | CodeCode Available | 1 |
| Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and Design | May 29, 2024 | Dataset GenerationImage to text | CodeCode Available | 1 |
| Language-Oriented Semantic Latent Representation for Image Transmission | May 16, 2024 | Image to textSemantic Communication | CodeCode Available | 1 |
| LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation? | Apr 16, 2024 | Image CaptioningImage Generation | CodeCode Available | 1 |
| ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes | Mar 7, 2024 | Image to textObject | CodeCode Available | 1 |
| Can MLLMs Perform Text-to-Image In-Context Learning? | Feb 2, 2024 | Image GenerationImage to text | CodeCode Available | 1 |
| Benchmarking Large Multimodal Models against Common Corruptions | Jan 22, 2024 | BenchmarkingImage to text | CodeCode Available | 1 |
| Improving Image Restoration through Removing Degradations in Textual Representations | Dec 28, 2023 | DeblurringDenoising | CodeCode Available | 1 |
| Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models | Nov 27, 2023 | Cross-Modal RetrievalImage Generation | CodeCode Available | 1 |
| UrbanCLIP: Learning Text-enhanced Urban Region Profiling with Contrastive Language-Image Pretraining from the Web | Oct 22, 2023 | Image to textLanguage Modeling | CodeCode Available | 1 |