Extending RevoScaleR for Mining Big Data – Discretization
R-bloggers 2013-04-12
Summary:
by Derek McCrae Norton, Senior Sales Engineer In this second installment of Extending RevoScaleR for Mining Big Data we look at how to use the building blocks provided by RevoScaleR to transform continuous variables into discrete. Motivation: Discretize continuous variables on big data. Discretization is a technique to convert continuous variables into discrete variables, and it is sometimes useful in data mining models such as Naïve Bayes. There are two basic methods, Equal Width and Equal Frequency, as well as many advanced methods such as Chi2, ChiMerge, and Tree Based methods. If we consider the two basic methods, they are...