Data Science for Business: Interview with NYU Stern’s Foster Provost
Data & Society / saved 2014-01-08
Summary:
Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking , written by Foster Provost and Tom Fawcett, has struck a chord with executives keen to explore the possibilities of data.
As a professor at New York University’s Stern School of Business, Provost noticed during his data classes that there weren’t any books he could recommend to his MBA students that taught the fundamental principles. They were either too granular and focused on the math, or else too high-level. Provost stopped using a book altogether and started writing class notes, eventually compiling them into a book with former colleague Tom Fawcett. Data Science for Business goes beyond this note-taking, designed to equip business executives with the knowledge they need to manage data scientists, gather data, and extract value from it.
We had the opportunity to engage in a discussion with Foster Provost about the value of sharing data knowledge across an organization, leveraging data as a strategic asset, and privacy implications, among other points. He also took us through a crash course on some of the fundamental concepts behind data science. What follows is an edited version of our transcripted conversation.
NYC Media Lab: This book is different from other books about data science. How?
Foster Provost: It’s different because the existing books either focused on the algorithms and the math, which this book doesn’t focus on, or they’re very general books that aren’t really for the person who seriously wants to learn about the material. So this book focuses on the fundamental principles underlying data science and data analytic thinking.
NYC Media Lab: Why is it important for everyone to have a common understanding of those principles?
Foster Provost: I’m not sure if it’s important for everyone. It’s important for people who are in organizations who would like to be able to take advantage of all the data that’s available these days. In order to succeed at a data-driven project, it’s not just that you need to have somebody who is a data scientist. You need to have a whole team that understands the fundamental principles. Data science projects generally are successful when you have managers who understand the fundamental principles and you have the tech team that understands the fundamental principles and you have top-notch data scientists, of course.
NYC Media Lab: We’re talking about a shared language, really.
Foster Provost: A shared language and a shared way of thinking.
NYC Media Lab: What’s different about the data-oriented way of thinking versus the common managerial principles that might be present?
Foster Provost: I think there are two aspects of this. One is, we think of solving problems data analytically as a craft. A mature craft has a process that the craftspeople follow in order to be able to do things consistently well. So there are processes for extracting useful knowledge from your data in order to be able to improve decision making. Some of the fundamental principles in the book are these frameworks for how to do thinking, how to do data analytic thinking.
Then the second aspect that makes it different from much managerial thinking, and I’m not sure that I can say that there is one kind of managerial thinking, but is that data science-oriented projects often have a large component that’s exploratory, that you don’t know when you start, what you’re going to get. So in that way, they’re much more like R D than they are like, let’s just say, a more traditional engineering project, and so that leads to certain ways of managing. For instance, if you have higher-risk projects, you might want to manage it as a portfolio rather than thinking, “I will do this one and it will succeed,” because you don’t necessarily know whether it will succeed.
NYC Media Lab: We’ll come back to that, but I just wanted to ask you one question. It’s sort of broad, but why should we think of ourselves as being in the era of Big Data 1.0?
Foster Provost: In the introductory chapter, we draw an analogy between what’s going on with big data now and what went on a little more than a decade ago with the web. So early on, in what we might think of as Web 1.0, the firms were focused on just getting a web presence, getting their e-commerce system actually running and bringing the money in and not taking a long time to respond to customers, and so they were really doing a lot of the block and tackle work just to be able to get the webs systems working.
Then, after firms had their systems solid, then they started to think, “What can we do differently now that we have web systems?” So you see the rise of the voice of the consumer. Amazon was precocious in this respect, with having reviews and having actually comments on reviews, and using the web to make recommendations and things like that.
I think we have the same situation with big data now. Big Data 1.0 is all about: can we even store our data, process our data, and so on? And once we get our ability to be able to store and process and manage big data, then firms will start to