SOTAVerified

Comparison of String Similarity Measures for Obscenity Filtering

2017-04-01WS 2017Unverified0· sign in to hype

Ekaterina Chernyak

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

In this paper we address the problem of filtering obscene lexis in Russian texts. We use string similarity measures to find words similar or identical to words from a stop list and establish both a test collection and a baseline for the task. Our experiments show that a novel string similarity measure based on the notion of an annotated suffix tree outperforms some of the other well known measures.

Tasks

Reproductions