| FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations | Nov 16, 2024 | Visual Storytelling | CodeCode Available | 3 |
| Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models | Jan 1, 2024 | Image GenerationText to Image Generation | CodeCode Available | 3 |
| Alfie: Democratising RGBA Image Generation With No $ | Aug 27, 2024 | Image GenerationImage Matting | CodeCode Available | 2 |
| CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation | Jun 15, 2024 | In-Context LearningText Generation | CodeCode Available | 2 |
| Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation | Jul 13, 2023 | RetrievalVideo Generation | CodeCode Available | 2 |
| Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models | Jun 1, 2023 | Image GenerationStory Visualization | CodeCode Available | 2 |
| StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation | May 15, 2025 | Face RecognitionObject | CodeCode Available | 1 |
| Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling | Aug 7, 2024 | Image GenerationLanguage Modelling | CodeCode Available | 1 |
| Gorgeous: Create Your Desired Character Facial Makeup from Any Ideas | Apr 22, 2024 | Visual Storytelling | CodeCode Available | 1 |
| inkn'hue: Enhancing Manga Colorization from Multiple Priors with Alignment Multi-Encoder VAE | Nov 3, 2023 | ColorizationVisual Storytelling | CodeCode Available | 1 |
| TouchStone: Evaluating Vision-Language Models by Language Models | Aug 31, 2023 | Visual Storytelling | CodeCode Available | 1 |
| Positional Diffusion: Ordering Unordered Sets with Diffusion Probabilistic Models | Mar 20, 2023 | Graph Neural NetworkSentence | CodeCode Available | 1 |
| Expressive Scene Graph Generation Using Commonsense Knowledge Infusion for Visual Understanding and Reasoning | May 31, 2022 | Common Sense ReasoningGraph Generation | CodeCode Available | 1 |
| Plot and Rework: Modeling Storylines for Visual Storytelling | May 14, 2021 | DiversityForm | CodeCode Available | 1 |
| Shape2Animal: Creative Animal Generation from Natural Silhouettes | Jun 25, 2025 | Visual Storytelling | —Unverified | 0 |
| JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent | Jun 21, 2025 | Instruction FollowingLarge Language Model | —Unverified | 0 |
| Consistent Story Generation with Asymmetry Zigzag Sampling | Jun 11, 2025 | Image GenerationStory Generation | CodeCode Available | 0 |
| Camera Trajectory Generation: A Comprehensive Survey of Methods, Metrics, and Future Directions | Jun 1, 2025 | Visual Storytelling | —Unverified | 0 |
| LoRAShop: Training-Free Multi-Concept Image Generation and Editing with Rectified Flow Transformers | May 29, 2025 | DenoisingImage Generation | —Unverified | 0 |
| Action2Dialogue: Generating Character-Centric Narratives from Scene-Level Prompts | May 22, 2025 | Dialogue GenerationLarge Language Model | —Unverified | 0 |
| VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs? | Apr 27, 2025 | Visual GroundingVisual Storytelling | —Unverified | 0 |
| FLIP Reasoning Challenge | Apr 16, 2025 | Common Sense Reasoningimage-classification | CodeCode Available | 0 |
| GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography | Apr 9, 2025 | Visual Storytelling | —Unverified | 0 |
| Storybooth: Training-free Multi-Subject Consistency for Improved Visual Storytelling | Apr 8, 2025 | Image GenerationText to Image Generation | —Unverified | 0 |
| DANTE-AD: Dual-Vision Attention Network for Long-Term Audio Description | Mar 31, 2025 | Video DescriptionVideo Understanding | —Unverified | 0 |
| VinaBench: Benchmark for Faithful and Consistent Visual Narratives | Mar 26, 2025 | Visual Storytelling | —Unverified | 0 |
| MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks | Mar 24, 2025 | Visual Storytelling | —Unverified | 0 |
| Toyteller: AI-powered Visual Storytelling Through Toy-Playing with Character Symbols | Jan 23, 2025 | Motion GenerationText Generation | —Unverified | 0 |
| Generative Visual Communication in the Era of Vision-Language Models | Nov 27, 2024 | Visual Storytelling | —Unverified | 0 |
| A Comprehensive Survey and Guide to Multimodal Large Language Models in Vision-Language Tasks | Nov 9, 2024 | Visual Storytelling | —Unverified | 0 |
| KAHANI: Culturally-Nuanced Visual Storytelling Pipeline for Non-Western Cultures | Oct 25, 2024 | Story GenerationVisual Storytelling | —Unverified | 0 |
| Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond | Oct 8, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Generating Visual Stories with Grounded and Coreferent Characters | Sep 20, 2024 | Story GenerationVisual Storytelling | —Unverified | 0 |
| Semantic Alignment for Multimodal Large Language Models | Aug 23, 2024 | Large Language ModelVisual Storytelling | —Unverified | 0 |
| Story3D-Agent: Exploring 3D Storytelling Visualization with Large Language Models | Aug 21, 2024 | Logical ReasoningMotion Synthesis | —Unverified | 0 |
| Context-aware Visual Storytelling with Visual Prefix Tuning and Contrastive Learning | Aug 12, 2024 | Contrastive LearningInformativeness | —Unverified | 0 |
| ContextualStory: Consistent Visual Storytelling with Spatially-Enhanced and Storyline Context | Jul 13, 2024 | Image GenerationStory Continuation | CodeCode Available | 0 |
| Not (yet) the whole story: Evaluating Visual Storytelling Requires More than Measuring Coherence, Grounding, and Repetition | Jul 5, 2024 | Visual GroundingVisual Storytelling | CodeCode Available | 0 |
| Improving Visual Storytelling with Multimodal Large Language Models | Jul 2, 2024 | Visual Storytelling | —Unverified | 0 |
| TARN-VIST: Topic Aware Reinforcement Network for Visual Storytelling | Mar 18, 2024 | Image CaptioningVisual Storytelling | —Unverified | 0 |
| AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production | Mar 12, 2024 | Image GenerationRAG | —Unverified | 0 |
| Metamorpheus: Interactive, Affective, and Creative Dream Narration Through Metaphorical Visual Storytelling | Mar 1, 2024 | ARCVisual Storytelling | —Unverified | 0 |
| SCO-VIST: Social Interaction Commonsense Knowledge-based Visual Storytelling | Feb 1, 2024 | DiversityImage Captioning | —Unverified | 0 |
| MagicScroll: Nontypical Aspect-Ratio Image Generation for Visual Storytelling via Multi-Layered Semantic-Aware Denoising | Dec 18, 2023 | DenoisingImage Generation | —Unverified | 0 |
| DiffuVST: Narrating Fictional Scenes with Global-History-Guided Denoising Models | Dec 12, 2023 | DenoisingDiversity | —Unverified | 0 |
| GROOViST: A Metric for Grounding Objects in Visual Storytelling | Oct 26, 2023 | Visual GroundingVisual Storytelling | CodeCode Available | 0 |
| Visual Storytelling with Question-Answer Plans | Oct 8, 2023 | Visual Storytelling | —Unverified | 0 |
| Envisioning Narrative Intelligence: A Creative Visual Storytelling Anthology | Oct 6, 2023 | Story GenerationVisual Storytelling | CodeCode Available | 0 |
| Comics for Everyone: Generating Accessible Text Descriptions for Comic Strips | Oct 1, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Text-Only Training for Visual Storytelling | Aug 17, 2023 | DiversityInformativeness | —Unverified | 0 |