Multimedia Generative Script Learning
Given an activity goal $G$, an optional subgoal $M$ that specifies the concrete needs, and the previous multimedia step history $H_n={(S_1,V_1),...,(S_n,V_n)}$ with length $n$, a model is expected to predict the next possible step $S_{n+1}$, where $S_i$ is a text sequence and $V_i$ is an image.
Papers
Showing 1–1 of 1 papers
| Title | Status | Hype |
|---|---|---|
| Multimedia Generative Script Learning for Task Planning | Code | 0 |
No leaderboard results yet.