Vtreat: designing a package for variable treatment

Win-Vector Blog 2014-09-04

Summary:

When you apply machine learning algorithms on a regular basis, on a wide variety of data sets, you find that certain data issues come up again and again: Missing values (NA or blanks) Problematic numerical values (Inf, NaN, sentinel values like 999999999 or -1) Valid categorical levels that don’t appear in the training data (especially […]

Link:

http://www.win-vector.com/blog/2014/08/vtreat-designing-a-package-for-variable-treatment/

From feeds:

Statistics and Visualization » Win-Vector Blog

Tags:

coding data science math programming practical data science pragmatic data science pragmatic machine learning statistics data cleaning data treatment impact coding missing data missing levels practical data science with r r variable treatment

Authors:

Nina Zumel

Date tagged:

09/04/2014, 23:20

Date published:

08/07/2014, 20:39