Bad Bayes: an example of why you need hold-out testing

Win-Vector Blog 2014-02-02

Summary:

We demonstrate a dataset that causes many good machine learning algorithms to horribly overfit. The example is designed to imitate a common situation found in predictive analytic natural language processing. In this type of application you are often building a model using many rare text features. The rare text features are often nearly unique k-grams […]

Don’t use correlation to track prediction performance
Generalized linear models for predicting rates
The cranky guide to trying R packages

Authors:

John Mount

Date tagged:

02/02/2014, 11:20

Date published:

02/01/2014, 11:14

Bad Bayes: an example of why you need hold-out testing

Win-Vector Blog 2014-02-02

Summary:

Link:

From feeds:

Tags:

Authors:

Date tagged:

Date published: