| Towards image compression with perfect realism at ultra-low bitrates | Oct 16, 2023 | Image CompressionImage Description | CodeCode Available | 1 |
| Bounding and Filling: A Fast and Flexible Framework for Image Captioning | Oct 15, 2023 | Image CaptioningImage Description | CodeCode Available | 0 |
| MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning | Oct 14, 2023 | Image ClassificationImage Description | CodeCode Available | 7 |
| ContextRef: Evaluating Referenceless Metrics For Image Description Generation | Sep 21, 2023 | Image Description | CodeCode Available | 0 |
| A skeletonization algorithm for gradient-based optimization | Sep 5, 2023 | BenchmarkingDeep Learning | CodeCode Available | 1 |
| A Fine-Grained Image Description Generation Method Based on Joint Objectives | Sep 2, 2023 | Image DescriptionObject | —Unverified | 0 |
| Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond | Aug 24, 2023 | Chart Question AnsweringFS-MEVQA | CodeCode Available | 5 |
| Chatting Makes Perfect: Chat-based Image Retrieval | May 31, 2023 | Chat-based Image RetrievalImage Description | CodeCode Available | 1 |
| PandaGPT: One Model To Instruction-Follow Them All | May 25, 2023 | AllImage Description | CodeCode Available | 2 |
| DiffCap: Exploring Continuous Diffusion on Image Captioning | May 20, 2023 | Caption GenerationDiversity | —Unverified | 0 |