Vtreat: designing a package for variable treatment

Win-Vector Blog 2014-08-08

Summary:

When you apply machine learning algorithms on a regular basis, on a wide variety of data sets, you find that certain data issues come up again and again: Missing values (NA or blanks) Problematic numerical values (Inf, NaN, sentinel values like 999999999 or -1) Valid categorical levels that don’t appear in the training data (especially […]

Link:

http://www.win-vector.com/blog/2014/08/vtreat-designing-a-package-for-variable-treatment/?utm_source=rss&utm_medium=rss&utm_campaign=vtreat-designing-a-package-for-variable-treatment

From feeds:

Statistics and Visualization » Win-Vector Blog

Tags:

coding data science math programming practical data science pragmatic data science pragmatic machine learning statistics data cleaning data treatment impact coding missing data missing levels r variable treatment practical data science with r

Authors:

Nina Zumel

Date tagged:

08/08/2014, 12:20

Date published:

08/07/2014, 20:39