| Fine-Grained Video Captioning through Scene Graph Consolidation | Feb 23, 2025 | Caption GenerationImage Captioning | —Unverified | 0 |
| LongCaptioning: Unlocking the Power of Long Caption Generation in Large Multimodal Models | Feb 21, 2025 | Caption GenerationVideo Captioning | —Unverified | 0 |
| Enhancing Chest X-ray Classification through Knowledge Injection in Cross-Modality Learning | Feb 19, 2025 | Caption GenerationClassification | —Unverified | 0 |
| FE-LWS: Refined Image-Text Representations via Decoder Stacking and Fused Encodings for Remote Sensing Image Captioning | Feb 13, 2025 | Caption GenerationDecoder | —Unverified | 0 |
| Expertized Caption Auto-Enhancement for Video-Text Retrieval | Feb 5, 2025 | Caption GenerationRetrieval | CodeCode Available | 0 |
| Do Large Multimodal Models Solve Caption Generation for Scientific Figures? Lessons Learned from SCICAP Challenge 2023 | Jan 31, 2025 | ArticlesCaption Generation | —Unverified | 0 |
| MAMS: Model-Agnostic Module Selection Framework for Video Captioning | Jan 30, 2025 | Caption GenerationVideo Captioning | —Unverified | 0 |
| Measuring and Mitigating Hallucinations in Vision-Language Dataset Generation for Remote Sensing | Jan 24, 2025 | Caption GenerationDataset Generation | —Unverified | 0 |
| Understanding How Paper Writers Use AI-Generated Captions in Figure Caption Writing | Jan 10, 2025 | Caption Generation | —Unverified | 0 |
| Multi-LLM Collaborative Caption Generation in Scientific Documents | Jan 5, 2025 | Caption GenerationImage to text | CodeCode Available | 0 |
| Time Series Language Model for Descriptive Caption Generation | Jan 3, 2025 | Caption GenerationDenoising | —Unverified | 0 |
| Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning | Dec 31, 2024 | Caption GenerationDecoder | —Unverified | 0 |
| Multimodal Preference Data Synthetic Alignment with Reward Model | Dec 23, 2024 | 2kCaption Generation | CodeCode Available | 0 |
| Learning from Massive Human Videos for Universal Humanoid Pose Control | Dec 18, 2024 | Caption GenerationHumanoid Control | —Unverified | 0 |
| From Simple to Professional: A Combinatorial Controllable Image Captioning Agent | Dec 15, 2024 | Caption Generationcontrollable image captioning | CodeCode Available | 0 |
| DIR: Retrieval-Augmented Image Captioning with Comprehensive Understanding | Dec 2, 2024 | Caption GenerationDomain Generalization | —Unverified | 0 |
| Benchmarking Multimodal Models for Ukrainian Language Understanding Across Academic and Cultural Domains | Nov 22, 2024 | BenchmarkingCaption Generation | —Unverified | 0 |
| Everything is a Video: Unifying Modalities through Next-Frame Prediction | Nov 15, 2024 | Caption GenerationCross-Modal Retrieval | —Unverified | 0 |
| Grounded Video Caption Generation | Nov 12, 2024 | Caption GenerationImage Captioning | —Unverified | 0 |
| SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs | Oct 12, 2024 | AudioCapsAudio captioning | CodeCode Available | 0 |
| GEM-VPC: A dual Graph-Enhanced Multimodal integration for Video Paragraph Captioning | Oct 12, 2024 | Caption GenerationDecoder | —Unverified | 0 |
| EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer | Sep 17, 2024 | Audio GenerationCaption Generation | —Unverified | 0 |
| CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving | Aug 19, 2024 | Autonomous DrivingCaption Generation | —Unverified | 0 |
| Mol2Lang-VLM: Vision- and Text-Guided Generative Pre-trained Language Models for Advancing Molecule Captioning through Multimodal Fusion | Aug 15, 2024 | Caption GenerationDecoder | CodeCode Available | 0 |
| See It All: Contextualized Late Aggregation for 3D Dense Captioning | Aug 14, 2024 | 3D dense captioningAll | —Unverified | 0 |