SOTAVerified

Exploring the Synergy Between Vision-Language Pretraining and ChatGPT for Artwork Captioning: A Preliminary Study

2023-01-21FAPER Workshop - ICIAP 2023 2023Code Available0· sign in to hype

Giovanna Castellano, Nicola Fanelli, Raffaele Scaringi, Gennaro Vessio

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

While AI techniques have enabled automated analysis and interpretation of visual content, generating meaningful captions for artworks presents unique challenges. These include understanding artistic intent, historical context, and complex visual elements. Despite recent developments in multi-modal techniques, there are still gaps in generating complete and accurate captions. This paper contributes by introducing a new dataset for artwork captioning generated using prompt engineering techniques and ChatGPT. We refined the captions with CLIPScore to filter out noise; then, we fine-tuned GIT-Base, resulting in visually accurate captions that surpass the ground truth. Enrichment of descriptions with predicted metadata improves their informativeness. Artwork captioning has implications for art appreciation, inclusivity, education, and cultural exchange, particularly for people with visual impairments or limited knowledge of art.

Tasks

Reproductions