Using partial pooling when preparing data for machine learning applications

Win-Vector Blog 2018-04-18

Summary:

Geoffrey Simmons writes: I reached out to John Mount/Nina Zumel over at Win Vector with a suggestion for their vtreat package, which automates many common challenges in preparing data for machine learning applications. The default behavior for impact coding high-cardinality variables had been a naive bayes approach, which I found to be problematic due its multi-modal output (assigning […]

The post Using partial pooling when preparing data for machine learning applications appeared first on Statistical Modeling, Causal Inference, and Social Science.

Link:

http://andrewgelman.com/2018/04/18/using-partial-pooling-preparing-data-machine-learning-applications/

From feeds:

Statistics and Visualization » Statistical Modeling, Causal Inference, and Social Science

Authors:

Andrew

Date tagged:

04/18/2018, 09:09

Date published:

04/18/2018, 09:01