More on preparing data
Win-Vector Blog 2016-03-22
The Microsoft Data Science User Group just sponsored Dr. Nina Zumel‘s presentation “Preparing Data for Analysis Using R”. Microsoft saw Win-Vector LLC‘s ODSC West 2015 presentation “Prepping Data for Analysis using R” and generously offered to sponsor improving it and disseminating it to a wider audience.
We feel Nina really hit the ball out of the park with over 400 new live viewers. Read more for links to even more free materials!
Microsoft has generously sponsored the following:
- The refinement and presentation of a new free 45 minute video lecture on preparing data for analytic work. The material was originally presented live, but register here to get a link to “join and view” the recorded presentation.
- Production of a small ebook/white-paper on the topic of preparing data for predictive analytics work under a Creative Commons Attribution-ShareAlike 4.0 International License.
These are really great materials and we will be promoting and distributing them widely.
Nina emphasized teaching the principles of data treatment and cleaning (frankly an under-emphasized task). She also mentioned a free R library supplied by Win-Vector LLC: vtreat that automates a great number of the steps in a principled and statistically sound manner. Because her lecture is likely to attract more interest in the vtreat library: we have tuned up the vtreat documentation a bit and made it available as pre-rendered HTML (in addition to the normal vignette distribution). Of particular interest we have finally enumerated all the variable types that vtreat uses to re-encode your data.