Upcoming speaking engagments

Win-Vector Blog 2018-04-19

I have a couple of public appearances coming up soon.

vtreat

Preparing Datasets – The Ugly Truth & Some Solutions is a great idea of Jim Porzak’s. Jim will speak on problems one is likely to encounter in trying to use real world data for predictive modeling and then I will speak on how the vtreat package helps address these issues. vtreat systematizes a number of routine domain independent data repairs and preparations, leaving you more time to work on important domain specific issues (plus it has citable documentation, helping make your methodology section smaller).

vtreat is the best way to prepare messy real world data for predictive modeling.

rquery

rquery: a Query Generator for Working With SQL Data

is an introduction to the rquery query generator system. rquery is a new R package that builds “pipe-able SQL” and includes a number of very powerful data operators and analyses. It includes a number of very neat features, including query pipeline diagrams.

NewImage

We think rquery (plus cdata) is going to be the best way (easiest to learn, most expressive, easiest to maintain, and most performant) method to use R to manipulate data at scale (SQL databases and Spark).