| Using Inter-Sentence Diverse Beam Search to Reduce Redundancy in Visual Storytelling | May 30, 2018 | Image to textSentence | —Unverified | 0 | 0 |
| A System for Image Understanding using Sensemaking and Narrative | Jan 21, 2022 | Visual Storytelling | —Unverified | 0 | 0 |
| A survey on knowledge-enhanced multimodal learning | Nov 19, 2022 | Conditional Image GenerationFactual Visual Question Answering | —Unverified | 0 | 0 |
| A Pipeline for Creative Visual Storytelling | Jul 21, 2018 | Visual Storytelling | —Unverified | 0 | 0 |
| JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent | Jun 21, 2025 | Instruction FollowingLarge Language Model | —Unverified | 0 | 0 |
| KAHANI: Culturally-Nuanced Visual Storytelling Pipeline for Non-Western Cultures | Oct 25, 2024 | Story GenerationVisual Storytelling | —Unverified | 0 | 0 |
| Keep it Consistent: Topic-Aware Storytelling from an Image Stream via Iterative Multi-agent Communication | Nov 11, 2019 | Image CaptioningQuestion Generation | —Unverified | 0 | 0 |
| VinaBench: Benchmark for Faithful and Consistent Visual Narratives | Mar 26, 2025 | Visual Storytelling | —Unverified | 0 | 0 |
| Knowledge-enriched Attention Network with Group-wise Semantic for Visual Storytelling | Mar 10, 2022 | DecoderStory Generation | —Unverified | 0 | 0 |
| Vision Transformer Based Model for Describing a Set of Images as a Story | Oct 6, 2022 | Language ModellingSentence | —Unverified | 0 | 0 |
| Learning to Rank Visual Stories From Human Ranking Data | Nov 16, 2021 | Learning-To-RankText Generation | —Unverified | 0 | 0 |
| VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs? | Apr 27, 2025 | Visual GroundingVisual Storytelling | —Unverified | 0 | 0 |
| LoRAShop: Training-Free Multi-Concept Image Generation and Editing with Rectified Flow Transformers | May 29, 2025 | DenoisingImage Generation | —Unverified | 0 | 0 |
| MagicScroll: Nontypical Aspect-Ratio Image Generation for Visual Storytelling via Multi-Layered Semantic-Aware Denoising | Dec 18, 2023 | DenoisingImage Generation | —Unverified | 0 | 0 |
| Metamorpheus: Interactive, Affective, and Creative Dream Narration Through Metaphorical Visual Storytelling | Mar 1, 2024 | ARCVisual Storytelling | —Unverified | 0 | 0 |
| MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks | Mar 24, 2025 | Visual Storytelling | —Unverified | 0 | 0 |
| Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings | May 3, 2023 | Data AugmentationQuestion Answering | —Unverified | 0 | 0 |
| "My Way of Telling a Story": Persona based Grounded Story Generation | Jun 14, 2019 | DecoderStory Generation | —Unverified | 0 | 0 |
| ``My Way of Telling a Story'': Persona based Grounded Story Generation | Aug 1, 2019 | DecoderStory Generation | —Unverified | 0 | 0 |
| Neural Event Extraction from Movies Description | Jun 1, 2018 | Event ExtractionMachine Translation | —Unverified | 0 | 0 |
| Visual Storytelling with Question-Answer Plans | Oct 8, 2023 | Visual Storytelling | —Unverified | 0 | 0 |
| A-CAP: Anticipation Captioning with Commonsense Knowledge | Apr 13, 2023 | Image CaptioningLanguage Modeling | —Unverified | 0 | 0 |
| On How Users Edit Computer-Generated Visual Stories | Feb 22, 2019 | ArticlesDiversity | —Unverified | 0 | 0 |
| AOG-LSTM: An adaptive attention neural network for visual storytelling | Jun 26, 2023 | DecoderVisual Storytelling | —Unverified | 0 | 0 |
| A Hierarchical Approach for Visual Storytelling Using Image Description | Sep 26, 2019 | DecoderImage Description | —Unverified | 0 | 0 |