Prepping Data for Analysis using R

Win-Vector Blog 2016-01-25

Nina and I are proud to share our lecture: “Prepping Data for Analysis using R” from ODSC West 2015.

Nina Zumel and John Mount ODSC WEST 2015

It is about 90 minutes, and covers a lot of the theory behind the vtreat data preparation library.

We also have a Github repository including all the lecture materials here.

Nina’s preview still (shown below) is one of my favorite slides. I think it really sets out ideas about how to think about novel levels (string values encountered during scoring that were not seen during training) in a nice problem driven way before getting into messy math (such as unknown frequency estimation).

ODSC 2015 part1 001