| Deep Imbalanced Attribute Classification using Visual Attention Aggregation | Jul 10, 2018 | AttributeClassification | CodeCode Available | 0 |
| Varying image description tasks: spoken versus written descriptions | Aug 1, 2018 | Image Description | CodeCode Available | 0 |
| Localized Symbolic Knowledge Distillation for Visual Commonsense Models | Dec 8, 2023 | Image DescriptionInstruction Following | CodeCode Available | 0 |
| Long-term Recurrent Convolutional Networks for Visual Recognition and Description | Nov 17, 2014 | Image DescriptionRetrieval | CodeCode Available | 0 |
| Improving Visual-Semantic Embeddings by Learning Semantically-Enhanced Hard Negatives for Cross-modal Information Retrieval | Oct 10, 2022 | Cross-Modal Information RetrievalImage Description | CodeCode Available | 0 |
| MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets | Mar 5, 2024 | DiversityImage Description | CodeCode Available | 0 |
| Measuring the Diversity of Automatic Image Descriptions | Aug 1, 2018 | DiversityImage Description | CodeCode Available | 0 |
| MiCEval: Unveiling Multimodal Chain of Thought's Quality via Image Description and Reasoning Steps | Oct 18, 2024 | Image DescriptionInformativeness | CodeCode Available | 0 |
| What a neural language model tells us about spatial relations | Jun 1, 2019 | Image DescriptionLanguage Modeling | CodeCode Available | 0 |
| Face2Text: Collecting an Annotated Image Description Corpus for the Generation of Rich Face Descriptions | Mar 10, 2018 | Image DescriptionImage to text | CodeCode Available | 0 |