SOTAVerified

Towards Semantic Noise Cleansing of Categorical Data based on Semantic Infusion

2020-02-06Unverified0· sign in to hype

Rishabh Gupta, Rajesh N Rao

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Semantic Noise affects text analytics activities for the domain-specific industries significantly. It impedes the text understanding which holds prime importance in the critical decision making tasks. In this work, we formalize semantic noise as a sequence of terms that do not contribute to the narrative of the text. We look beyond the notion of standard statistically-based stop words and consider the semantics of terms to exclude the semantic noise. We present a novel Semantic Infusion technique to associate meta-data with the categorical corpus text and demonstrate its near-lossless nature. Based on this technique, we propose an unsupervised text-preprocessing framework to filter the semantic noise using the context of the terms. Later we present the evaluation results of the proposed framework using a web forum dataset from the automobile-domain.

Tasks

Reproductions