Upcoming speaking engagments
Win-Vector Blog 2018-04-19
I have a couple of public appearances coming up soon.
- The East Bay R Language Beginners Group: Preparing Datasets – The Ugly Truth & Some Solutions, Tuesday, May 1, 2018 at Robert Half Technologies, 1999 Harrison Street, Oakland, CA, 94612.
- Official May 2018 BARUG Meeting: rquery: a Query Generator for Working With SQL Data, Tuesday, May 8, 2018 at Intuit, Building 20 2600 Marine Way · Mountain View, CA.
vtreat
Preparing Datasets – The Ugly Truth & Some Solutions is a great idea of Jim Porzak’s. Jim will speak on problems one is likely to encounter in trying to use real world data for predictive modeling and then I will speak on how the vtreat
package helps address these issues. vtreat
systematizes a number of routine domain independent data repairs and preparations, leaving you more time to work on important domain specific issues (plus it has citable documentation, helping make your methodology section smaller).
vtreat
is the best way to prepare messy real world data for predictive modeling.
rquery
rquery: a Query Generator for Working With SQL Data
is an introduction to the rquery
query generator system. rquery
is a new R
package that builds “pipe-able SQL” and includes a number of very powerful data operators and analyses. It includes a number of very neat features, including query pipeline diagrams.
We think rquery
(plus cdata
) is going to be the best way (easiest to learn, most expressive, easiest to maintain, and most performant) method to use R
to manipulate data at scale (SQL databases and Spark).