Beyond Adjacency Pairs: Hierarchical Clustering of Long Sequences for Human-Machine Dialogues
2020-11-01EMNLP (CODI) 2020Unverified0· sign in to hype
Maitreyee Maitreyee
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
This work proposes a framework to predict sequences in dialogues, using turn based syntactic features and dialogue control functions. Syntactic features were extracted using dependency parsing, while dialogue control functions were manually labelled. These features were transformed using tf-idf and word embedding; feature selection was done using Principal Component Analysis (PCA). We ran experiments on six combinations of features to predict sequences with Hierarchical Agglomerative Clustering. An analysis of the clustering results indicate that using word embeddings and syntactic features, significantly improved the results.