| Multi30K: Multilingual English-German Image Descriptions | May 2, 2016 | Image DescriptionMachine Translation | CodeCode Available | 0 | 5 |
| Contextualize, Show and Tell: A Neural Visual Storyteller | Jun 3, 2018 | DecoderImage Description | CodeCode Available | 0 | 5 |
| RRHF-V: Ranking Responses to Mitigate Hallucinations in Multimodal Large Language Models with Human Feedback | Jan 1, 2025 | HallucinationImage Comprehension | CodeCode Available | 0 | 5 |
| Focused Evaluation for Image Description with Binary Forced-Choice Tasks | Aug 1, 2016 | Image CaptioningImage Description | —Unverified | 0 | 0 |
| Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description | Oct 19, 2017 | Image DescriptionMachine Translation | —Unverified | 0 | 0 |
| A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching | Jun 1, 2013 | Image DescriptionVideo Description | —Unverified | 0 | 0 |
| Facial Expression Recognition and Image Description Generation in Vietnamese | Aug 12, 2022 | DescriptiveEmotion Recognition | —Unverified | 0 | 0 |
| Face2Text revisited: Improved data set and baseline results | May 24, 2022 | Image DescriptionTransfer Learning | —Unverified | 0 | 0 |
| Exploring Visual Relationship for Image Captioning | Sep 19, 2018 | DecoderImage Captioning | —Unverified | 0 | 0 |
| Computer Vision and Conflicting Values: Describing People with Automated Alt Text | May 26, 2021 | Image Description | —Unverified | 0 | 0 |