| Caption Anything: Interactive Image Description with Diverse Multimodal Controls | May 4, 2023 | controllable image captioningImage Captioning | CodeCode Available | 3 |
| MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models | Apr 20, 2023 | Image DescriptionLanguage Modelling | CodeCode Available | 7 |
| Fan-Beam Binarization Difference Projection (FB-BDP): A Novel Local Object Descriptor for Fine-Grained Leaf Image Retrieval | Jan 1, 2023 | BinarizationImage Description | CodeCode Available | 0 |
| DialogCC: An Automated Pipeline for Creating High-Quality Multi-Modal Dialogue Dataset | Dec 8, 2022 | DiversityImage Description | CodeCode Available | 1 |
| Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation | Oct 20, 2022 | DecoderImage Captioning | CodeCode Available | 1 |
| Improving Visual-Semantic Embeddings by Learning Semantically-Enhanced Hard Negatives for Cross-modal Information Retrieval | Oct 10, 2022 | Cross-Modal Information RetrievalImage Description | CodeCode Available | 0 |
| Facial Expression Recognition and Image Description Generation in Vietnamese | Aug 12, 2022 | DescriptiveEmotion Recognition | —Unverified | 0 |
| Skeletal Human Action Recognition using Hybrid Attention based Graph Convolutional Network | Jul 12, 2022 | Action RecognitionImage Description | CodeCode Available | 0 |
| Image Description Dataset for Language Learners | Jun 1, 2022 | Image DescriptionSentence | —Unverified | 0 |
| Multilingual Image Corpus – Towards a Multimodal and Multilingual Dataset | Jun 1, 2022 | Caption Generationimage-classification | —Unverified | 0 |