ParsVQA-Caps: A Benchmark for Visual Question Answering and Image Captioning in Persian
2022-12-07WiNLP2022 2022Unverified0· sign in to hype
Shaghayegh Mobasher, Ghazal Zamaninejad, Maryam Hashemi, Melika Nobakhtian, Sauleh Eetemadi
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Despite recent advances in vision-and-language tasks, most progress is still focused on resource-rich languages such as English. Furthermore, widespread vision-and-language datasets directly adopt images representative of American or European cultures resulting in bias. Hence we introduce ParsVQA-Caps, the first benchmark in Persian for Visual Question Answering and Image Captioning tasks. We utilize two ways to collect datasets for each task, human-based and template-based for VQA and human-based and web-based for image captioning.