Bad Bayes: an example of why you need hold-out testing
Win-Vector Blog 2014-02-02
Summary:
We demonstrate a dataset that causes many good machine learning algorithms to horribly overfit. The example is designed to imitate a common situation found in predictive analytic natural language processing. In this type of application you are often building a model using many rare text features. The rare text features are often nearly unique k-grams […]