Reading Predictive Analytics

Numbers Rule Your World 2013-06-24

Siegel-predictive-analytics-bookPredictive Analyticsby Eric Siegel (link) was published earlier this year. Siegel is a consultant andorganizer of a series of popular industry conferences, which I attend with someregularity. I  recommend this book for readers who want to understand thecurrent state of “data science” at a deeper level than the New York Times’s but still nonmathematical. If you want to measureagainst my own writing, then Siegel spends more time addressing “how” than Itypically do on this blog. He also has a fondness for lists, tables, quotations,pictures, and turns of phrases—I did mention lists, and lists within sentences.

Reading this book is like worming through the crowd at aconference center in New York, or Boston, or San Francisco, where Siegel’smeetings are usually held. Siegel is like your Kevin Bacon, your link topractitioners of the art/science of data-driven business decision-making. (Ijust gave the definition, rather than the name, of the field, which has as manynames as cheongsams worn by MaggieCheung in In the Mood for Love.Siegel selected “predictive analytics,” while you may also see “data science,”“data mining,” “machine learning,” “statistical learning,” “knowledgediscovery,” “statistical modeling,” “business analytics,” etc.)

***

Predictive Analyticspaints an accurate picture of the applications and discourse circa 2000s.Chapter 2 describes a model used by the retailer Target to find potentialcustomers who are pregnant women—this application was later picked up by CharlesDuhigg in The Power of Habit (link), and a New York Times Magazine article, which Iexamined in a previous post. (Disclosure: I have further comments on pregnancytargeting in my forthcoming book.) Also in Chapter 2 is a description of howHewlett Packard analyzes data on its employees, a nice companion piece to therecent New York Times profile of Google’s SVP of “people operations.” See mypost here. The third example in Chapter 2 concerns local police using data tofind criminals.

Social media analytics, possibly the hippest corner of theindustry, gets star billing in Chapter 3. Two researchers summarizedLivejournal blog posts into an “Anxiety Index” which they claimed predictedS&P 500. Chapter 4 contains an extensive description of a popular techniqueknown as “decision trees” applied to financial risk management. For those whoread Chapter 2 of Numbers Rule YourWorld, this section provides more technical details on risk scoring. Somekeywords to look out for are overfitting (called overlearning), using testdatasets to evaluate accuracy, and Occam’s razor.

Chapter 5 covers ensemble models, a relatively new techniquewith broad applications. What this means is instead of the traditional route ofdeveloping one “best” predictor, conduct a poll of a set of predictors. Sort ofa wisdom of crowds approach. The winning team in the Netflix Prize—in whichteams competed to improve the “accuracy” of Netflix queue recommendations—usedan ensemble.

Chapter 7 introduces net lift models, which is an unresolvedbut important area of business analytics. Take an example of Time Warner Cablewanting to send special offers to “vulnerable” customers hoping to retain them.Traditional predictive models find those customers who are most likely tocancel their service. The trouble is that special offers are very expensive,and some of those customers would not require this incentive in order to renew.A “net lift” model is more accurate in only targeting those customers who arelikely to cancel plus likely to renew only if Time Warner makes a specialoffer. Technically, the latter problem is much harder to solve.

***

Needless to say, the price of the book is a fraction of theconference fee. While the overall tone is optimistic, Siegel does not shy awayfrom discussing the limitations of data analytics. This I find to be a virtue,a relief from the relentless hype that has enveloped this field of work. I’dlike to end this review with a quote (p. 201):

Commanding a computer to learn is like teaching ablindfolded monkey to design a fashion diva’s gown. The computer knows nothing.It has no notion of the meaning behind the data, the concept of what amortgage, salary or even a house is. The numbers are just numbers. Even clueslike “$” and “%” don’t mean anything to the machine. It’s a blind, mindlessautomation stuck in a box during its first day on the job.