| Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training | Oct 9, 2024 | Caption GenerationContrastive Learning | CodeCode Available | 2 |
| Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions | Aug 8, 2023 | Caption GenerationImage Captioning | CodeCode Available | 2 |
| AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language Models | Nov 28, 2024 | Audio captioningAudio to Text Retrieval | CodeCode Available | 2 |
| Frame- and Segment-Level Features and Candidate Pool Evaluation for Video Caption Generation | Aug 17, 2016 | Caption GenerationDecoder | CodeCode Available | 1 |
| Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation | Jan 2, 2023 | Caption GenerationInstance Segmentation | CodeCode Available | 1 |
| End-to-End Dense Video Captioning with Parallel Decoding | Aug 17, 2021 | Caption GenerationDense Video Captioning | CodeCode Available | 1 |
| GL-RG: Global-Local Representation Granularity for Video Captioning | May 22, 2022 | Caption GenerationDescriptive | CodeCode Available | 1 |
| Belief Revision based Caption Re-ranker with Visual Semantic Information | Sep 16, 2022 | Caption GenerationImage Captioning | CodeCode Available | 1 |
| Deep Reinforcement Learning For Sequence to Sequence Models | May 24, 2018 | Abstractive Text SummarizationCaption Generation | CodeCode Available | 1 |
| Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning | Jul 16, 2024 | Caption Generationcross-modal alignment | CodeCode Available | 1 |