SOTAVerified

Efficient Generation and Processing of Word Co-occurrence Networks Using corpus2graph

2018-06-01WS 2018Code Available0· sign in to hype

Zheng Zhang, Pierre Zweigenbaum, Ruiqing Yin

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Corpus2graph is an open-source NLP-application-oriented tool that generates a word co-occurrence network from a large corpus. It not only contains different built-in methods to preprocess words, analyze sentences, extract word pairs and define edge weights, but also supports user-customized functions. By using parallelization techniques, it can generate a large word co-occurrence network of the whole English Wikipedia data within hours. And thanks to its nodes-edges-weight three-level progressive calculation design, rebuilding networks with different configurations is even faster as it does not need to start all over again. This tool also works with other graph libraries such as igraph, NetworkX and graph-tool as a front end providing data to boost network generation speed.

Tasks

Reproductions