| Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data | Oct 2, 2024 | Audio ClassificationCaption Generation | CodeCode Available | 1 |
| EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer | Sep 17, 2024 | Audio GenerationCaption Generation | —Unverified | 0 |
| CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving | Aug 19, 2024 | Autonomous DrivingCaption Generation | —Unverified | 0 |
| Mol2Lang-VLM: Vision- and Text-Guided Generative Pre-trained Language Models for Advancing Molecule Captioning through Multimodal Fusion | Aug 15, 2024 | Caption GenerationDecoder | CodeCode Available | 0 |
| See It All: Contextualized Late Aggregation for 3D Dense Captioning | Aug 14, 2024 | 3D dense captioningAll | —Unverified | 0 |
| Bi-directional Contextual Attention for 3D Dense Captioning | Aug 13, 2024 | 3D dense captioningAttribute | —Unverified | 0 |
| Dual-path Collaborative Generation Network for Emotional Video Captioning | Aug 6, 2024 | Caption GenerationVideo Captioning | CodeCode Available | 0 |
| SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models | Jul 30, 2024 | Caption GenerationQuestion Answering | CodeCode Available | 2 |
| XMeCap: Meme Caption Generation with Sub-Image Adaptability | Jul 24, 2024 | Caption GenerationMeme Captioning | —Unverified | 0 |
| Continual Panoptic Perception: Towards Multi-modal Incremental Interpretation of Remote Sensing Images | Jul 19, 2024 | Caption GenerationContinual Learning | CodeCode Available | 0 |