| Grounded Video Description | Dec 17, 2018 | Image DescriptionSentence | CodeCode Available | 1 |
| Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations | Feb 23, 2016 | image-classificationImage Classification | CodeCode Available | 1 |
| Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models | May 19, 2015 | Image DescriptionPhrase Grounding | CodeCode Available | 1 |
| CIDEr: Consensus-based Image Description Evaluation | Nov 20, 2014 | Action RecognitionAttribute | CodeCode Available | 1 |
| Advanced Chest X-Ray Analysis via Transformer-Based Image Descriptors and Cross-Model Attention Mechanism | Apr 23, 2025 | DecoderImage Description | —Unverified | 0 |
| LaMOuR: Leveraging Language Models for Out-of-Distribution Recovery in Reinforcement Learning | Mar 21, 2025 | Code GenerationDeep Reinforcement Learning | —Unverified | 0 |
| VisBias: Measuring Explicit and Implicit Social Biases in Vision Language Models | Mar 10, 2025 | Image DescriptionMultiple-choice | CodeCode Available | 0 |
| Boli: A dataset for understanding stuttering experience and analyzing stuttered speech | Jan 27, 2025 | Image Description | —Unverified | 0 |
| IDEA: Image Description Enhanced CLIP-Adapter | Jan 15, 2025 | Few-Shot Image Classificationimage-classification | CodeCode Available | 0 |
| Exploring the Use of Contrastive Language-Image Pre-Training for Human Posture Classification: Insights from Yoga Pose Analysis | Jan 13, 2025 | Image DescriptionTransfer Learning | —Unverified | 0 |