SOTAVerified

Accounting for Language Effect in the Evaluation of Cross-lingual AMR Parsers

2022-10-01COLING 2022Code Available0· sign in to hype

Shira Wein, Nathan Schneider

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Cross-lingual Abstract Meaning Representation (AMR) parsers are currently evaluated in comparison to gold English AMRs, despite parsing a language other than English, due to the lack of multilingual AMR evaluation metrics. This evaluation practice is problematic because of the established effect of source language on AMR structure. In this work, we present three multilingual adaptations of monolingual AMR evaluation metrics and compare the performance of these metrics to sentence-level human judgments. We then use our most highly correlated metric to evaluate the output of state-of-the-art cross-lingual AMR parsers, finding that Smatch may still be a useful metric in comparison to gold English AMRs, while our multilingual adaptation of S2match (XS2match) is best for comparison with gold in-language AMRs.

Tasks

Reproductions