Story Continuation
The task involves providing an initial scene that can be obtained in real world use cases. By including this scene, a model can then copy and adapt elements from it as it generates subsequent images.
Source: StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story Continuation
Papers
No papers found.
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | StoryDALL-E (Story Embeddings + Cross-Attention) | FID | 36.28 | — | Unverified |
| 2 | StoryDALL-E (Cross-Attention) | FID | 35.04 | — | Unverified |
| 3 | StoryDALL-E (Story Embeddings) | FID | 29.21 | — | Unverified |
| 4 | StoryDALL-E | FID | 28.37 | — | Unverified |
| 5 | AR-LDM | FID | 19.28 | — | Unverified |
| 6 | ContextualStory | FID | 16.33 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | StoryDALL-E (Story Embeddings + Cross-Attention) | FID | 31.68 | — | Unverified |
| 2 | StoryDALL-E (Story Embeddings) | FID | 30.45 | — | Unverified |
| 3 | StoryDALL-E (Cross-Attention) | FID | 23.27 | — | Unverified |
| 4 | StoryDALL-E | FID | 21.64 | — | Unverified |
| 5 | AR-LDM | FID | 17.4 | — | Unverified |
| 6 | ContextualStory | FID | 14.2 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | AR-LDM (DII captions) | FID | 17.03 | — | Unverified |
| 2 | AR-LDM (SIS captions) | FID | 16.95 | — | Unverified |