| Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models | Jan 1, 2024 | Image GenerationText to Image Generation | CodeCode Available | 3 |
| FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations | Nov 16, 2024 | Visual Storytelling | CodeCode Available | 3 |
| Alfie: Democratising RGBA Image Generation With No $ | Aug 27, 2024 | Image GenerationImage Matting | CodeCode Available | 2 |
| Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation | Jul 13, 2023 | RetrievalVideo Generation | CodeCode Available | 2 |
| Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models | Jun 1, 2023 | Image GenerationStory Visualization | CodeCode Available | 2 |
| CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation | Jun 15, 2024 | In-Context LearningText Generation | CodeCode Available | 2 |
| inkn'hue: Enhancing Manga Colorization from Multiple Priors with Alignment Multi-Encoder VAE | Nov 3, 2023 | ColorizationVisual Storytelling | CodeCode Available | 1 |
| Gorgeous: Create Your Desired Character Facial Makeup from Any Ideas | Apr 22, 2024 | Visual Storytelling | CodeCode Available | 1 |
| StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation | May 15, 2025 | Face RecognitionObject | CodeCode Available | 1 |
| Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling | Aug 7, 2024 | Image GenerationLanguage Modelling | CodeCode Available | 1 |
| TouchStone: Evaluating Vision-Language Models by Language Models | Aug 31, 2023 | Visual Storytelling | CodeCode Available | 1 |
| Plot and Rework: Modeling Storylines for Visual Storytelling | May 14, 2021 | DiversityForm | CodeCode Available | 1 |
| Expressive Scene Graph Generation Using Commonsense Knowledge Infusion for Visual Understanding and Reasoning | May 31, 2022 | Common Sense ReasoningGraph Generation | CodeCode Available | 1 |
| Positional Diffusion: Ordering Unordered Sets with Diffusion Probabilistic Models | Mar 20, 2023 | Graph Neural NetworkSentence | CodeCode Available | 1 |
| Bloom Library: Multimodal Datasets in 300+ Languages for a Variety of Downstream Tasks | Oct 26, 2022 | Image CaptioningLanguage Modeling | —Unverified | 0 |
| Discourse Analysis for Evaluating Coherence in Video Paragraph Captions | Jan 17, 2022 | Video CaptioningVisual Dialog | —Unverified | 0 |
| Keep it Consistent: Topic-Aware Storytelling from an Image Stream via Iterative Multi-agent Communication | Nov 11, 2019 | Image CaptioningQuestion Generation | —Unverified | 0 |
| BERT-hLSTMs: BERT and Hierarchical LSTMs for Visual Storytelling | Dec 3, 2020 | SentenceVisual Storytelling | —Unverified | 0 |
| A Hierarchical Approach for Visual Storytelling Using Image Description | Sep 26, 2019 | DecoderImage Description | —Unverified | 0 |
| Action2Dialogue: Generating Character-Centric Narratives from Scene-Level Prompts | May 22, 2025 | Dialogue GenerationLarge Language Model | —Unverified | 0 |
| Knowledge-enriched Attention Network with Group-wise Semantic for Visual Storytelling | Mar 10, 2022 | DecoderStory Generation | —Unverified | 0 |
| A survey on knowledge-enhanced multimodal learning | Nov 19, 2022 | Conditional Image GenerationFactual Visual Question Answering | —Unverified | 0 |
| DANTE-AD: Dual-Vision Attention Network for Long-Term Audio Description | Mar 31, 2025 | Video DescriptionVideo Understanding | —Unverified | 0 |
| A System for Image Understanding using Sensemaking and Narrative | Jan 21, 2022 | Visual Storytelling | —Unverified | 0 |
| DiffuVST: Narrating Fictional Scenes with Global-History-Guided Denoising Models | Dec 12, 2023 | DenoisingDiversity | —Unverified | 0 |
| DiMBERT: Learning Vision-Language Grounded Representations with Disentangled Multimodal-Attention | Oct 28, 2022 | Image CaptioningLanguage Modeling | —Unverified | 0 |
| Context-aware Visual Storytelling with Visual Prefix Tuning and Contrastive Learning | Aug 12, 2024 | Contrastive LearningInformativeness | —Unverified | 0 |
| Diverse and Relevant Visual Storytelling with Scene Graph Embeddings | Nov 1, 2020 | DiversityStory Generation | —Unverified | 0 |
| Dixit: Interactive Visual Storytelling via Term Manipulation | Mar 6, 2019 | DecoderVisual Storytelling | —Unverified | 0 |
| Camera Trajectory Generation: A Comprehensive Survey of Methods, Metrics, and Future Directions | Jun 1, 2025 | Visual Storytelling | —Unverified | 0 |
| A-CAP: Anticipation Captioning with Commonsense Knowledge | Apr 13, 2023 | Image CaptioningLanguage Modeling | —Unverified | 0 |
| A Pipeline for Creative Visual Storytelling | Jul 21, 2018 | Visual Storytelling | —Unverified | 0 |
| Graph Similarities and Dual Approach for Sequential Text-to-Image Retrieval | Sep 29, 2021 | Graph EmbeddingImage Retrieval | —Unverified | 0 |
| AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production | Mar 12, 2024 | Image GenerationRAG | —Unverified | 0 |
| JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent | Jun 21, 2025 | Instruction FollowingLarge Language Model | —Unverified | 0 |
| AOG-LSTM: An adaptive attention neural network for visual storytelling | Jun 26, 2023 | DecoderVisual Storytelling | —Unverified | 0 |
| Generative Visual Communication in the Era of Vision-Language Models | Nov 27, 2024 | Visual Storytelling | —Unverified | 0 |
| Generating Visual Stories with Grounded and Coreferent Characters | Sep 20, 2024 | Story GenerationVisual Storytelling | —Unverified | 0 |
| Comics for Everyone: Generating Accessible Text Descriptions for Comic Strips | Oct 1, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Commonsense Knowledge Aware Concept Selection For Diverse and Informative Visual Storytelling | Feb 5, 2021 | DiversityInformativeness | —Unverified | 0 |
| Hide-and-Tell: Learning to Bridge Photo Streams for Visual Storytelling | Feb 3, 2020 | Image CaptioningVisual Storytelling | —Unverified | 0 |
| Hierarchically-Attentive RNN for Album Summarization and Storytelling | Aug 9, 2017 | RetrievalVisual Storytelling | —Unverified | 0 |
| Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation | May 21, 2018 | DecoderDeep Reinforcement Learning | —Unverified | 0 |
| Hierarchical memory decoder for visual narrating | Sep 1, 2020 | DecoderImage Captioning | —Unverified | 0 |
| Hierarchical Photo-Scene Encoder for Album Storytelling | Feb 2, 2019 | DecoderImage-guided Story Ending Generation | —Unverified | 0 |
| Imagine, Reason and Write: Visual Storytelling with Graph Knowledge and Relational Reasoning | May 18, 2021 | DiversityInformativeness | —Unverified | 0 |
| Improving Visual Storytelling with Multimodal Large Language Models | Jul 2, 2024 | Visual Storytelling | —Unverified | 0 |
| Incorporating Textual Evidence in Visual Storytelling | Nov 21, 2019 | Object RecognitionStory Generation | —Unverified | 0 |
| Induction and Reference of Entities in a Visual Story | Sep 15, 2019 | SentenceVisual Storytelling | —Unverified | 0 |
| GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography | Apr 9, 2025 | Visual Storytelling | —Unverified | 0 |