Multi-Granularity Contrastive Knowledge Distillation for Multimodal Named Entity Recognition
Anonymous
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
It is very valuable to recognize named entities from short and informal multimodal posts in this age of information explosion. Despite existing methods success in multi-modal named entity recognition (MNER), they rely on the well aligned text and image pairs, while a lot of noises exist in the datasets. And the representation of text and image with internal correlations is difficult to establish a deep connection, because of the mismatched semantic levels of the text encoder and image encoder. In this paper, we propose multi-granularity contrastive knowledge distillation (MGC) to build a unified joint representation space of two modalities. By leveraging multi-granularity contrastive loss, our approach pushes representations of matched image-text pairs or image-entity pairs together while pushing those unrelated image-text or image-entity pairs apart. By utilizing CLIP model for knowledge distillation, we can obtain a more fine-grained visual concept. Experimental results on two benchmark datasets prove the effectiveness of our method.