``Caption'' as a Coherence Relation: Evidence and Implications
Malihe Alikhani, Matthew Stone
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
We study verbs in image--text corpora, contrasting caption corpora, where texts are explicitly written to characterize image content, with depiction corpora, where texts and images may stand in more general relations. Captions show a distinctively limited distribution of verbs, with strong preferences for specific tense, aspect, lexical aspect, and semantic field. These limitations, which appear in data elicited by a range of methods, restrict the utility of caption corpora to inform image retrieval, multimodal document generation, and perceptually-grounded semantic models. We suggest that these limitations reflect the discourse constraints in play when subjects write texts to accompany imagery, so we argue that future development of image--text corpora should work to increase the diversity of event descriptions, while looking explicitly at the different ways text and imagery can be coherently related.