| Long-term Recurrent Convolutional Networks for Visual Recognition and Description | Nov 17, 2014 | Image DescriptionRetrieval | CodeCode Available | 0 | 5 |
| Improving Visual-Semantic Embeddings by Learning Semantically-Enhanced Hard Negatives for Cross-modal Information Retrieval | Oct 10, 2022 | Cross-Modal Information RetrievalImage Description | CodeCode Available | 0 | 5 |
| CIDEr-R: Robust Consensus-based Image Description Evaluation | Sep 28, 2021 | DescriptiveImage Description | CodeCode Available | 0 | 5 |
| Localized Symbolic Knowledge Distillation for Visual Commonsense Models | Dec 8, 2023 | Image DescriptionInstruction Following | CodeCode Available | 0 | 5 |
| MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets | Mar 5, 2024 | DiversityImage Description | CodeCode Available | 0 | 5 |
| IDEA: Image Description Enhanced CLIP-Adapter | Jan 15, 2025 | Few-Shot Image Classificationimage-classification | CodeCode Available | 0 | 5 |
| Describing Videos by Exploiting Temporal Structure | Feb 27, 2015 | Action RecognitionImage Description | CodeCode Available | 0 | 5 |
| Bridging Languages through Images with Deep Partial Canonical Correlation Analysis | Jul 1, 2018 | Image DescriptionImage Retrieval | CodeCode Available | 0 | 5 |
| How Do Image Description Systems Describe People? A Targeted Assessment of System Competence in the PEOPLE-domain | Dec 1, 2020 | Image Description | CodeCode Available | 0 | 5 |
| Deep Imbalanced Attribute Classification using Visual Attention Aggregation | Jul 10, 2018 | AttributeClassification | CodeCode Available | 0 | 5 |