| A-CAP: Anticipation Captioning with Commonsense Knowledge | Apr 13, 2023 | Image CaptioningLanguage Modeling | —Unverified | 0 |
| Detecting and Grounding Important Characters in Visual Stories | Mar 30, 2023 | Visual Storytelling | CodeCode Available | 0 |
| Positional Diffusion: Ordering Unordered Sets with Diffusion Probabilistic Models | Mar 20, 2023 | Graph Neural NetworkSentence | CodeCode Available | 1 |
| Visual Writing Prompts: Character-Grounded Story Generation with Curated Image Sequences | Jan 20, 2023 | Coherence EvaluationGrounded language learning | —Unverified | 0 |
| A survey on knowledge-enhanced multimodal learning | Nov 19, 2022 | Conditional Image GenerationFactual Visual Question Answering | —Unverified | 0 |
| DiMBERT: Learning Vision-Language Grounded Representations with Disentangled Multimodal-Attention | Oct 28, 2022 | Image CaptioningLanguage Modeling | —Unverified | 0 |
| Bloom Library: Multimodal Datasets in 300+ Languages for a Variety of Downstream Tasks | Oct 26, 2022 | Image CaptioningLanguage Modeling | —Unverified | 0 |
| Vision Transformer Based Model for Describing a Set of Images as a Story | Oct 6, 2022 | Language ModellingSentence | —Unverified | 0 |
| Coherent Visual Storytelling via Parallel Top-Down Visual and Topic Attention | Aug 17, 2022 | DiversitySentence | —Unverified | 0 |
| RoViST: Learning Robust Metrics for Visual Storytelling | Jul 1, 2022 | SentenceText Generation | CodeCode Available | 0 |
| SentiStory: A Multi-Layered Sentiment-Aware Generative Model for Visual Storytelling | Jun 16, 2022 | Visual Storytelling | —Unverified | 0 |
| Expressive Scene Graph Generation Using Commonsense Knowledge Infusion for Visual Understanding and Reasoning | May 31, 2022 | Common Sense ReasoningGraph Generation | CodeCode Available | 1 |
| RoViST:Learning Robust Metrics for Visual Storytelling | May 8, 2022 | SentenceText Generation | CodeCode Available | 0 |
| Learning to Rank Visual Stories From Human Ranking Data | May 1, 2022 | Learning-To-RankText Generation | CodeCode Available | 0 |
| Knowledge-enriched Attention Network with Group-wise Semantic for Visual Storytelling | Mar 10, 2022 | DecoderStory Generation | —Unverified | 0 |
| A System for Image Understanding using Sensemaking and Narrative | Jan 21, 2022 | Visual Storytelling | —Unverified | 0 |
| Discourse Analysis for Evaluating Coherence in Video Paragraph Captions | Jan 17, 2022 | Video CaptioningVisual Dialog | —Unverified | 0 |
| Visual Storytelling with Hierarchical BERT Semantic Guidance | Jan 10, 2022 | SentenceText Generation | —Unverified | 0 |
| RoViST: Learning Robust Metrics for Visual Storytelling | Dec 17, 2021 | SentenceText Generation | —Unverified | 0 |
| Towards Coherent Visual Storytelling with Ordered Image Attention | Nov 16, 2021 | PositionSentence | —Unverified | 0 |
| Learning to Rank Visual Stories From Human Ranking Data | Nov 16, 2021 | Learning-To-RankText Generation | —Unverified | 0 |
| Graph Similarities and Dual Approach for Sequential Text-to-Image Retrieval | Sep 29, 2021 | Graph EmbeddingImage Retrieval | —Unverified | 0 |
| Ordered Attention for Coherent Visual Storytelling | Aug 4, 2021 | SentenceVisual Storytelling | —Unverified | 0 |
| Stretch-VST: Getting Flexible With Visual Stories | Aug 1, 2021 | SentenceVisual Storytelling | —Unverified | 0 |
| Two Heads are Better Than One: Hypergraph-Enhanced Graph Reasoning for Visual Event Ratiocination | Jul 18, 2021 | Visual Storytelling | —Unverified | 0 |