Prepping Data for Analysis using R
Win-Vector Blog 2016-01-25
Nina and I are proud to share our lecture: “Prepping Data for Analysis using R” from ODSC West 2015.
Nina Zumel and John Mount ODSC WEST 2015
It is about 90 minutes, and covers a lot of the theory behind the vtreat
data preparation library.
We also have a Github repository including all the lecture materials here.
Nina’s preview still (shown below) is one of my favorite slides. I think it really sets out ideas about how to think about novel levels (string values encountered during scoring that were not seen during training) in a nice problem driven way before getting into messy math (such as unknown frequency estimation).