KanCMD: Kannada CodeMixed Dataset for Sentiment Analysis and Offensive Language Detection

2020-12-01COLING (PEOPLES) 2020Unverified0· sign in to hype

Adeep Hande, Ruba Priyadharshini, Bharathi Raja Chakravarthi

Unverified — Be the first to reproduce this paper.

Abstract

We introduce Kannada CodeMixed Dataset (KanCMD), a multi-task learning dataset for sentiment analysis and offensive language identification. The KanCMD dataset highlights two real-world issues from the social media text. First, it contains actual comments in code mixed text posted by users on YouTube social media, rather than in monolingual text from the textbook. Second, it has been annotated for two tasks, namely sentiment analysis and offensive language detection for under-resourced Kannada language. Hence, KanCMD is meant to stimulate research in under-resourced Kannada language on real-world code-mixed social media text and multi-task learning. KanCMD was obtained by crawling the YouTube, and a minimum of three annotators annotates each comment. We release KanCMD 7,671 comments for multitask learning research purpose.

Tasks

Language Identification Multi-Task Learning Sentiment Analysis

KanCMD: Kannada CodeMixed Dataset for Sentiment Analysis and Offensive Language Detection

Abstract

Tasks

Reproductions