| Efficient Decentralized Visual Place Recognition From Full-Image Descriptors | May 30, 2017 | ClusteringImage Description | CodeCode Available | 0 |
| Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs | Oct 15, 2024 | Image DescriptionMultiple-choice | CodeCode Available | 0 |
| VisBias: Measuring Explicit and Implicit Social Biases in Vision Language Models | Mar 10, 2025 | Image DescriptionMultiple-choice | CodeCode Available | 0 |
| Multi30K: Multilingual English-German Image Descriptions | May 2, 2016 | Image DescriptionMachine Translation | CodeCode Available | 0 |
| Cross-linguistic differences and similarities in image descriptions | Jul 6, 2017 | Image DescriptionSpecificity | CodeCode Available | 0 |
| Multilingual Image Description with Neural Sequence Models | Oct 15, 2015 | Image CaptioningImage Description | CodeCode Available | 0 |
| Room for improvement in automatic image description: an error analysis | Apr 13, 2017 | Image Description | CodeCode Available | 0 |
| RRHF-V: Ranking Responses to Mitigate Hallucinations in Multimodal Large Language Models with Human Feedback | Jan 1, 2025 | HallucinationImage Comprehension | CodeCode Available | 0 |
| Bounding and Filling: A Fast and Flexible Framework for Image Captioning | Oct 15, 2023 | Image CaptioningImage Description | CodeCode Available | 0 |
| Contextualize, Show and Tell: A Neural Visual Storyteller | Jun 3, 2018 | DecoderImage Description | CodeCode Available | 0 |