| Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention | Apr 3, 2025 | Caption GenerationContrastive Learning | —Unverified | 0 |
| Identifying Multi-modal Knowledge Neurons in Pretrained Transformers via Two-stage Filtering | Mar 29, 2025 | Caption Generationknowledge editing | —Unverified | 0 |
| LaPIG: Cross-Modal Generation of Paired Thermal and Visible Facial Images | Mar 20, 2025 | Caption GenerationDiversity | —Unverified | 0 |
| IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification | Mar 13, 2025 | Caption Generation | —Unverified | 0 |
| Integrating Frequency-Domain Representations with Low-Rank Adaptation in Vision-Language Models | Mar 8, 2025 | Caption GenerationQuestion Answering | —Unverified | 0 |
| Fine-Grained Video Captioning through Scene Graph Consolidation | Feb 23, 2025 | Caption GenerationImage Captioning | —Unverified | 0 |
| LongCaptioning: Unlocking the Power of Long Caption Generation in Large Multimodal Models | Feb 21, 2025 | Caption GenerationVideo Captioning | —Unverified | 0 |
| Enhancing Chest X-ray Classification through Knowledge Injection in Cross-Modality Learning | Feb 19, 2025 | Caption GenerationClassification | —Unverified | 0 |
| FE-LWS: Refined Image-Text Representations via Decoder Stacking and Fused Encodings for Remote Sensing Image Captioning | Feb 13, 2025 | Caption GenerationDecoder | —Unverified | 0 |
| Expertized Caption Auto-Enhancement for Video-Text Retrieval | Feb 5, 2025 | Caption GenerationRetrieval | CodeCode Available | 0 |