| Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning | Jul 16, 2024 | Caption Generationcross-modal alignment | CodeCode Available | 1 |
| Explainable Image Captioning using CNN- CNN architecture and Hierarchical Attention | Jun 28, 2024 | Caption GenerationDecoder | —Unverified | 0 |
| HCQA @ Ego4D EgoSchema Challenge 2024 | Jun 22, 2024 | Caption Generation | CodeCode Available | 1 |
| Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models? | Jun 20, 2024 | Caption GenerationHallucination | —Unverified | 0 |
| Enhancing Cross-Prompt Transferability in Vision-Language Models through Contextual Injection of Target Tokens | Jun 19, 2024 | Caption Generationimage-classification | CodeCode Available | 0 |
| Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning | Jun 15, 2024 | Caption Generation | CodeCode Available | 0 |
| DS@BioMed at ImageCLEFmedical Caption 2024: Enhanced Attention Mechanisms in Medical Caption Generation through Concept Detection Integration | Jun 1, 2024 | Caption GenerationImage Captioning | —Unverified | 0 |
| Multi-Modal Generative Embedding Model | May 29, 2024 | Caption GenerationCross-Modal Retrieval | —Unverified | 0 |
| Less for More: Enhanced Feedback-aligned Mixed LLMs for Molecule Caption Generation and Fine-Grained NLI Evaluation | May 22, 2024 | Caption GenerationHallucination | —Unverified | 0 |
| MICap: A Unified Model for Identity-aware Movie Descriptions | May 19, 2024 | Caption GenerationDecoder | —Unverified | 0 |