HuaAMS at SemEval-2022 Task 8: Combining Translation and Domain Pre-training for Cross-lingual News Article Similarity

2022-07-01SemEval (NAACL) 2022Unverified0· sign in to hype

Sai Sandeep Sharma Chittilla, Talaat Khalil

Unverified — Be the first to reproduce this paper.

Abstract

This paper describes our submission to SemEval-2022 Multilingual News Article Similarity task. We experiment with different approaches that utilize a pre-trained language model fitted with a regression head to predict similarity scores for a given pair of news articles. Our best performing systems include 2 key steps: 1) pre-training with in-domain data 2) training data enrichment through machine translation. Our final submission is an ensemble of predictions from our top systems. While we show the significance of pre-training and augmentation, we believe the issue of language coverage calls for more attention.

Tasks

Articles Language Modeling Language Modelling Machine Translation regression Translation

HuaAMS at SemEval-2022 Task 8: Combining Translation and Domain Pre-training for Cross-lingual News Article Similarity

Abstract

Tasks

Reproductions