Contrastive Event Extraction Using Video Enhancements
Anonymous
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Event extraction aims to extract information of triggers associated with arguments from texts. Recent advanced methods consider the multi-modality to tackle the task by pairing the modalities without guaranteeing the alignment of event information across modalities, which negatively impacts on the model performances. To address the issue, we firstly constructed the Text Video Event Extraction (TVEE) dataset with an inner annotator agreement of 83.4\%, containing 7,598 pairs of text-videos, each of which is connected by event alignments. To the best of our knowledge, this is the first multimodal dataset with aligned event information in each sentence and video pair. Secondly, we present a Contrastive Learning based Event Extraction model with enhancements from the Video modality (CLEEV) to pair videos and texts using event information. CLEEV constructs negative samples by measuring event weights based on occurrences of event types to enhance the contrast.We conducted experiments on the TVEE and VM2E2 datasets by incorporating modalities to assist the event extraction, outperforming SOTA methods with 1.0 and 1.2 point percentage improvements in terms of F-score, respectively.Our experimental results show that the multimedia information improves the event extraction from the textual modality The dataset and code will be released based on acceptance.