| MAMS: Model-Agnostic Module Selection Framework for Video Captioning | Jan 30, 2025 | Caption GenerationVideo Captioning | —Unverified | 0 |
| Measuring and Mitigating Hallucinations in Vision-Language Dataset Generation for Remote Sensing | Jan 24, 2025 | Caption GenerationDataset Generation | —Unverified | 0 |
| Understanding How Paper Writers Use AI-Generated Captions in Figure Caption Writing | Jan 10, 2025 | Caption Generation | —Unverified | 0 |
| Multi-LLM Collaborative Caption Generation in Scientific Documents | Jan 5, 2025 | Caption GenerationImage to text | CodeCode Available | 0 |
| Time Series Language Model for Descriptive Caption Generation | Jan 3, 2025 | Caption GenerationDenoising | —Unverified | 0 |
| Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning | Dec 31, 2024 | Caption GenerationDecoder | —Unverified | 0 |
| Multimodal Preference Data Synthetic Alignment with Reward Model | Dec 23, 2024 | 2kCaption Generation | CodeCode Available | 0 |
| Learning from Massive Human Videos for Universal Humanoid Pose Control | Dec 18, 2024 | Caption GenerationHumanoid Control | —Unverified | 0 |
| From Simple to Professional: A Combinatorial Controllable Image Captioning Agent | Dec 15, 2024 | Caption Generationcontrollable image captioning | CodeCode Available | 0 |
| DIR: Retrieval-Augmented Image Captioning with Comprehensive Understanding | Dec 2, 2024 | Caption GenerationDomain Generalization | —Unverified | 0 |