Feature selection in high-dimensional dataset using MapReduce
2017-09-07Code Available0· sign in to hype
Claudio Reggiani, Yann-Aël Le Borgne, Gianluca Bontempi
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/creggian/spark-ifsOfficialIn papernone★ 0
Abstract
This paper describes a distributed MapReduce implementation of the minimum Redundancy Maximum Relevance algorithm, a popular feature selection method in bioinformatics and network inference problems. The proposed approach handles both tall/narrow and wide/short datasets. We further provide an open source implementation based on Hadoop/Spark, and illustrate its scalability on datasets involving millions of observations or features.