New Getting Started with vtreat Documentation
Win-Vector Blog 2019-09-01
Win Vector LLC‘s Dr. Nina Zumel has just released some new vtreat documentation.
vtreat is a an all-in one step data preparation system that helps defend your machine learning algorithms from:
- Missing values
- Large cardinality categorical variables
- Novel levels from categorical variables
I hoped she could get the Python vtreat documentation up to parity with the R vtreat documentation. But I think she really hit the ball out of the park, and went way past that.
The new documentation is 3 “getting started” guides. These guides deliberately overlap, so you don’t have to read them all. Just read the one suited to your problem and go.
The new guides:
- Using vtreat with Classification Problems
- Using vtreat with Regression Problems
- Using vtreat with Unsupervised Problems and Non-Y-aware data treatment
Perhaps we can back-port the new guides to the R version at some point.