| Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data | Oct 2, 2024 | Audio ClassificationCaption Generation | CodeCode Available | 1 |
| EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer | Sep 17, 2024 | Audio GenerationCaption Generation | —Unverified | 0 |
| CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving | Aug 19, 2024 | Autonomous DrivingCaption Generation | —Unverified | 0 |
| Mol2Lang-VLM: Vision- and Text-Guided Generative Pre-trained Language Models for Advancing Molecule Captioning through Multimodal Fusion | Aug 15, 2024 | Caption GenerationDecoder | CodeCode Available | 0 |
| See It All: Contextualized Late Aggregation for 3D Dense Captioning | Aug 14, 2024 | 3D dense captioningAll | —Unverified | 0 |
| Bi-directional Contextual Attention for 3D Dense Captioning | Aug 13, 2024 | 3D dense captioningAttribute | —Unverified | 0 |
| Dual-path Collaborative Generation Network for Emotional Video Captioning | Aug 6, 2024 | Caption GenerationVideo Captioning | CodeCode Available | 0 |
| SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models | Jul 30, 2024 | Caption GenerationQuestion Answering | CodeCode Available | 2 |
| XMeCap: Meme Caption Generation with Sub-Image Adaptability | Jul 24, 2024 | Caption GenerationMeme Captioning | —Unverified | 0 |
| Continual Panoptic Perception: Towards Multi-modal Incremental Interpretation of Remote Sensing Images | Jul 19, 2024 | Caption GenerationContinual Learning | CodeCode Available | 0 |
| Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning | Jul 16, 2024 | Caption Generationcross-modal alignment | CodeCode Available | 1 |
| Explainable Image Captioning using CNN- CNN architecture and Hierarchical Attention | Jun 28, 2024 | Caption GenerationDecoder | —Unverified | 0 |
| HCQA @ Ego4D EgoSchema Challenge 2024 | Jun 22, 2024 | Caption Generation | CodeCode Available | 1 |
| Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models? | Jun 20, 2024 | Caption GenerationHallucination | —Unverified | 0 |
| Enhancing Cross-Prompt Transferability in Vision-Language Models through Contextual Injection of Target Tokens | Jun 19, 2024 | Caption Generationimage-classification | CodeCode Available | 0 |
| Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning | Jun 15, 2024 | Caption Generation | CodeCode Available | 0 |
| DS@BioMed at ImageCLEFmedical Caption 2024: Enhanced Attention Mechanisms in Medical Caption Generation through Concept Detection Integration | Jun 1, 2024 | Caption GenerationImage Captioning | —Unverified | 0 |
| Multi-Modal Generative Embedding Model | May 29, 2024 | Caption GenerationCross-Modal Retrieval | —Unverified | 0 |
| Less for More: Enhanced Feedback-aligned Mixed LLMs for Molecule Caption Generation and Fine-Grained NLI Evaluation | May 22, 2024 | Caption GenerationHallucination | —Unverified | 0 |
| MICap: A Unified Model for Identity-aware Movie Descriptions | May 19, 2024 | Caption GenerationDecoder | —Unverified | 0 |
| SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset | May 12, 2024 | Action SpottingAutomatic Speech Recognition | CodeCode Available | 1 |
| Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation | Apr 30, 2024 | Caption GenerationHallucination | —Unverified | 0 |
| BCAmirs at SemEval-2024 Task 4: Beyond Words: A Multimodal and Multilingual Exploration of Persuasion in Memes | Apr 3, 2024 | Caption GenerationHierarchical Multi-label Classification | CodeCode Available | 1 |
| The Solution for the ICCV 2023 1st Scientific Figure Captioning Challenge | Mar 26, 2024 | Caption GenerationImage Captioning | —Unverified | 0 |
| LuoJiaHOG: A Hierarchy Oriented Geo-aware Image Caption Dataset for Remote Sensing Image-Text Retrival | Mar 16, 2024 | Caption GenerationImage-text Retrieval | —Unverified | 0 |
| PathM3: A Multimodal Multi-Task Multiple Instance Learning Framework for Whole Slide Image Classification and Captioning | Mar 13, 2024 | Caption GenerationDiagnostic | —Unverified | 0 |
| Enhancing Image Caption Generation Using Reinforcement Learning with Human Feedback | Mar 11, 2024 | Caption Generationreinforcement-learning | —Unverified | 0 |
| MeaCap: Memory-Augmented Zero-shot Image Captioning | Mar 6, 2024 | Caption GenerationImage Captioning | CodeCode Available | 2 |
| LLMs in Political Science: Heralding a New Era of Visual Analysis | Feb 29, 2024 | Caption GenerationFace Identification | —Unverified | 0 |
| Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation | Jan 18, 2024 | Caption GenerationLanguage Modeling | —Unverified | 0 |
| Social Media Ready Caption Generation for Brands | Jan 3, 2024 | Caption GenerationImage Captioning | —Unverified | 0 |
| BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving | Jan 2, 2024 | Autonomous DrivingCaption Generation | —Unverified | 0 |
| Set Prediction Guided by Semantic Concepts for Diverse Video Captioning | Dec 25, 2023 | Caption GenerationDiversity | —Unverified | 0 |
| Automatic Report Generation for Histopathology images using pre-trained Vision Transformers and BERT | Dec 3, 2023 | Caption GenerationDecoder | CodeCode Available | 0 |
| Segment and Caption Anything | Dec 1, 2023 | Caption Generationobject-detection | CodeCode Available | 2 |
| Enhancing Image Captioning with Neural Models | Dec 1, 2023 | Caption GenerationImage Captioning | —Unverified | 0 |
| IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers | Nov 27, 2023 | Caption GenerationImage-text Retrieval | —Unverified | 0 |
| DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism | Nov 25, 2023 | Caption GenerationDenoising | —Unverified | 0 |
| NeuSyRE: Neuro-Symbolic Visual Understanding and Reasoning Framework based on Scene Graph Enrichment | Nov 5, 2023 | Caption GenerationCommon Sense Reasoning | CodeCode Available | 1 |
| Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation Protocols | Nov 5, 2023 | Caption GenerationDense Video Captioning | —Unverified | 0 |
| Visual Analytics for Efficient Image Exploration and User-Guided Image Captioning | Nov 2, 2023 | Caption GenerationEfficient Exploration | —Unverified | 0 |
| LoHoRavens: A Long-Horizon Language-Conditioned Benchmark for Robotic Tabletop Manipulation | Oct 18, 2023 | Caption GenerationInstruction Following | —Unverified | 0 |
| VidCoM: Fast Video Comprehension through Large Language Models with Multimodal Tools | Oct 16, 2023 | Caption GenerationDescriptive | —Unverified | 0 |
| ViPE: Visualise Pretty-much Everything | Oct 16, 2023 | Caption GenerationFigurative Language Visualization | CodeCode Available | 0 |
| VLIS: Unimodal Language Models Guide Multimodal Language Generation | Oct 15, 2023 | Caption GenerationExplanation Generation | CodeCode Available | 1 |
| A Comparative Study of Pre-trained CNNs and GRU-Based Attention for Image Caption Generation | Oct 11, 2023 | Caption GenerationDecoder | —Unverified | 0 |
| Self-supervised Cross-view Representation Reconstruction for Change Captioning | Sep 28, 2023 | Caption GenerationHallucination | CodeCode Available | 1 |
| FaceGemma: Enhancing Image Captioning with Facial Attributes for Portrait Images | Sep 24, 2023 | AttributeCaption Generation | —Unverified | 0 |
| Auto-ACD: A Large-scale Dataset for Audio-Language Representation Learning | Sep 20, 2023 | Audio captioningCaption Generation | —Unverified | 0 |
| RECAP: Retrieval-Augmented Audio Captioning | Sep 18, 2023 | AudioCapsAudio captioning | CodeCode Available | 1 |