SOTAVerified

The Impact of Text Normalization on Multiword Expressions Discovery in Persian

2021-09-01RANLP 2021Unverified0· sign in to hype

Katarzyna Marszałek-Kowalewska

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

This paper evaluates normalization procedures of Persian text for a downstream NLP task - multiword expressions (MWEs) discovery. We discuss the challenges the Persian language poses for NLP and evaluate open-source tools that try to address these difficulties. The best-performing tool is later used in the main task - MWEs discovery. In order to discover MWEs, we use association measures and a subpart of the MirasText corpus. The results show that an F-score is 26% higher in the case of normalized input data.

Tasks

Reproductions