| Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data | Oct 2, 2024 | Audio ClassificationCaption Generation | CodeCode Available | 1 |
| EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer | Sep 17, 2024 | Audio GenerationCaption Generation | —Unverified | 0 |
| CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving | Aug 19, 2024 | Autonomous DrivingCaption Generation | —Unverified | 0 |
| Mol2Lang-VLM: Vision- and Text-Guided Generative Pre-trained Language Models for Advancing Molecule Captioning through Multimodal Fusion | Aug 15, 2024 | Caption GenerationDecoder | CodeCode Available | 0 |
| See It All: Contextualized Late Aggregation for 3D Dense Captioning | Aug 14, 2024 | 3D dense captioningAll | —Unverified | 0 |
| Bi-directional Contextual Attention for 3D Dense Captioning | Aug 13, 2024 | 3D dense captioningAttribute | —Unverified | 0 |
| Dual-path Collaborative Generation Network for Emotional Video Captioning | Aug 6, 2024 | Caption GenerationVideo Captioning | CodeCode Available | 0 |
| SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models | Jul 30, 2024 | Caption GenerationQuestion Answering | CodeCode Available | 2 |
| XMeCap: Meme Caption Generation with Sub-Image Adaptability | Jul 24, 2024 | Caption GenerationMeme Captioning | —Unverified | 0 |
| Continual Panoptic Perception: Towards Multi-modal Incremental Interpretation of Remote Sensing Images | Jul 19, 2024 | Caption GenerationContinual Learning | CodeCode Available | 0 |
| Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning | Jul 16, 2024 | Caption Generationcross-modal alignment | CodeCode Available | 1 |
| Explainable Image Captioning using CNN- CNN architecture and Hierarchical Attention | Jun 28, 2024 | Caption GenerationDecoder | —Unverified | 0 |
| HCQA @ Ego4D EgoSchema Challenge 2024 | Jun 22, 2024 | Caption Generation | CodeCode Available | 1 |
| Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models? | Jun 20, 2024 | Caption GenerationHallucination | —Unverified | 0 |
| Enhancing Cross-Prompt Transferability in Vision-Language Models through Contextual Injection of Target Tokens | Jun 19, 2024 | Caption Generationimage-classification | CodeCode Available | 0 |
| Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning | Jun 15, 2024 | Caption Generation | CodeCode Available | 0 |
| DS@BioMed at ImageCLEFmedical Caption 2024: Enhanced Attention Mechanisms in Medical Caption Generation through Concept Detection Integration | Jun 1, 2024 | Caption GenerationImage Captioning | —Unverified | 0 |
| Multi-Modal Generative Embedding Model | May 29, 2024 | Caption GenerationCross-Modal Retrieval | —Unverified | 0 |
| Less for More: Enhanced Feedback-aligned Mixed LLMs for Molecule Caption Generation and Fine-Grained NLI Evaluation | May 22, 2024 | Caption GenerationHallucination | —Unverified | 0 |
| MICap: A Unified Model for Identity-aware Movie Descriptions | May 19, 2024 | Caption GenerationDecoder | —Unverified | 0 |
| SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset | May 12, 2024 | Action SpottingAutomatic Speech Recognition | CodeCode Available | 1 |
| Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation | Apr 30, 2024 | Caption GenerationHallucination | —Unverified | 0 |
| BCAmirs at SemEval-2024 Task 4: Beyond Words: A Multimodal and Multilingual Exploration of Persuasion in Memes | Apr 3, 2024 | Caption GenerationHierarchical Multi-label Classification | CodeCode Available | 1 |
| The Solution for the ICCV 2023 1st Scientific Figure Captioning Challenge | Mar 26, 2024 | Caption GenerationImage Captioning | —Unverified | 0 |
| LuoJiaHOG: A Hierarchy Oriented Geo-aware Image Caption Dataset for Remote Sensing Image-Text Retrival | Mar 16, 2024 | Caption GenerationImage-text Retrieval | —Unverified | 0 |