SOTAVerified

Test Suites Task: Evaluation of Gender Fairness in MT with MuST-SHE and INES

2023-10-30Code Available0· sign in to hype

Beatrice Savoldi, Marco Gaido, Matteo Negri, Luisa Bentivogli

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

As part of the WMT-2023 "Test suites" shared task, in this paper we summarize the results of two test suites evaluations: MuST-SHE-WMT23 and INES. By focusing on the en-de and de-en language pairs, we rely on these newly created test suites to investigate systems' ability to translate feminine and masculine gender and produce gender-inclusive translations. Furthermore we discuss metrics associated with our test suites and validate them by means of human evaluations. Our results indicate that systems achieve reasonable and comparable performance in correctly translating both feminine and masculine gender forms for naturalistic gender phenomena. Instead, the generation of inclusive language forms in translation emerges as a challenging task for all the evaluated MT models, indicating room for future improvements and research on the topic.

Tasks

Reproductions