How does Practical Data Science with R stand out?
Win-Vector Blog 2014-09-04
There are a lot of good books on statistics, machine learning, analytics, and R. So it is valid to ask: how does Practical Data Science with R stand out? Why should a data scientist or an aspiring data scientist buy it?
We admit, it isn’t the only book we own. Some relevant books from the Win-Vector LLC company library include:
And a few more from our digital bookshelf:
“Practical Data Science with R” stands out in that it:
- Concentrates on the process of data science (working with teams and tools to deploy predictive models into production).
- Spends more time on how to acquire and load non-trivial data sets (including working with SQL, CSV files, and Excel).
- Spends more time on data treatment (which allows standard modeling methods to be used in new and powerful ways).
- Deals with real world issues such as setting expectations and producing presentations (not strictly a part of machine learning, but very much a part of data science).
- Includes free code and data to reproduce almost every analysis and graph in the book (and there are a lot of them).
- Many data scientists say “you spend 90% of your time preparing your data for analysis.” Our book actually spends time explaining these steps.
- Prepares you to use many of these other books.
Work through “Practical Data Science with R” and you will learn a lot about the practice of data science.
Why do we need a book on data science? Some ask “is data science just a fad?” and “we have had statistics for hundreds of years, so why do we need data science?” Obviously the term “data science” is on the high portion of its hype cycle. But data science is a real and important discipline. One way it differs from statistics (which itself is an important tool needed by data scientists) is: data science involves a lot more programming, a lot more work on data architecture, a lot more tools, and a lot more domain/client empathy. Statisticians already do a lot of programming, but data scientists can end up doing even more. I would say one of the assumptions of data science is: there is a client (either real or imagined) that the data scientist is working for (similar to the customer role in agile development). Data scientists also tend to use a large number of tools (you can start with R, but depending on your client needs you may need to eventually work with many more tools). We feel that there is a significant gap in the teaching of the gestalt of data science that “Practical Data Science with R” fills.
The methods “Practical Data Science with R” teaches are entirely based on free and open source software (R, RStudio, SQuirreL SQL, H2 DB, and others) and are cross platform (running on OSX, Linux, and Windows). So once you buy the book, you are ready to start work on significant projects.
If you feel “Practical Data Science with R” doesn’t go deep enough on foundational topics (such as R itself, statistics or SQL) we suggest consulting one or more of the following in parallel:
- Kabacoff “R in Action” 2nd edition (our current favorite book about R and statistics).
- Freedman, Pisani, Purves “Statistics” 4th edition (good writing on statistics).
- Celko “SQL for Smarties” 4th edition (clear writing about advanced query techniques, learn SQL before you try big data tools such as Hive).
“Practical Data Science with R” emphasizes the business questions (such as determining what type of score is actually useful for your client) and assumes machine learning is something you can delegate to ready-made algorithms (which is the main reason to use R). If you want to move on to machine learning algorithm design and analysis try:
- Hastie, Tibshirani, Friedman “The Elements of Statistical Learning” 2nd edition the book on analyzing machine learning algorithms.
- James, Witten, Hastie, Tibshirani “An Introduction to Statistical Learning: with Applications in R” an R example oriented introduction to statistical machine learning.
- Kuhn, Johnson “Applied Predictive Modeling” Theory (and worked examples in R) of building and tuning predictive analytic models.
If you want interesting descriptions of data science (something to share with your boss or colleagues) we suggest checking out:
- Provost, Fawcett “Data Science for Business” (a description of data science for “people who will be working with data scientists”).
- O’Neil, Schutt “Doing Data Science” (guest presentations from a data science class bound together as a set of essays).
Good books, in the mind of a good reader, amplify each other (not detract from each other). The fact that Celko is an excellent book on SQL doesn’t lesson Hastie/Tibshirani/Friedman’s authoritativeness on statistical machine learning. Yet these are all topics that are relevant to data science.
All of that being said: we think “Practical Data Science with R” is one of the best introductions to data science. “Practical Data Science with R” attempts to convey the actual process of data science through worked examples (that may include programming, SQL, machine learning, and presenting to clients). The data scientist may not equally enjoy all of the sub-steps and sub-specialties, but is expected (by discerning clients) to do (or delegate) them all.
If you want to try your hand at a data science project we strongly recommend “Practical Data Science with R.” Available from our publisher, Amazon.com, and other booksellers.
Feel free to visit here to freely inspect “Practical Data Science with R”‘s:
- Table of Contents
- Foreword
- Preface
- About this book
- Chapter 3
- Chapter 8
- Index
- help forum
- example data
- code
And some excerpts from Amazon reviews:
“This is the book that I wish was available when I was first learning Data Science.”
J. Fister
Paulo Nuin Suano
David M. Steier