Bad Bayes: an example of why you need hold-out testing

Win-Vector Blog 2014-02-02

Summary:

We demonstrate a dataset that causes many good machine learning algorithms to horribly overfit. The example is designed to imitate a common situation found in predictive analytic natural language processing. In this type of application you are often building a model using many rare text features. The rare text features are often nearly unique k-grams […]

Link:

http://www.win-vector.com/blog/2014/02/bad-bayes-an-example-of-why-you-need-hold-out-testing/?utm_source=rss&utm_medium=rss&utm_campaign=bad-bayes-an-example-of-why-you-need-hold-out-testing

From feeds:

Statistics and Visualization » Win-Vector Blog

Tags:

data science pragmatic machine learning statistics tutorials cross-validation fitting noise generalizaion error hold-out testing naive bayes natrual language processing nlp r

Authors:

John Mount

Date tagged:

02/02/2014, 11:20

Date published:

02/01/2014, 11:14