"Analyzing large-scale data: Taxi Tipping behavior in NYC" (This Week at the Statistics Seminar)
Three-Toed Sloth 2016-02-23
Summary:
Attention conservation notice: Only of interest if you (1) care about large-scale data analysis and/or taxis, and (2) will be in Pittsburgh on Thursday.
The last but by no means least talk seminar talk this week:
- Taylor Arnold, "Analyzing large-scale data: Taxi Tipping behavior in NYC"
- Abstract: Statisticians are increasingly tasked with providing insights from large streaming data sources, which can quickly grow to be terabytes or petabytes in size. In this talk, I explore novel approaches for applying classical and emerging techniques to large-scale datasets. Specifically, I discuss methodologies for expressing estimators in terms of the (weighted) Gramian matrix and other easily distributed summary statistics. I then present an abstraction layer for implementing chunk-wise algorithms that are interoperable over many parallel and distributed software frameworks. The utility and insights garnered from these methods are shown through an application to an event based dataset provided by the New York City Taxi and Limousine Commission. I have joined these observations, which detail every registered taxicab trip from 2009 to the present, with external sources such as weather conditions and demographics. I use the aforementioned techniques to explore factors associated with taxi demand and the tipping behavior of riders. My focus is on developing novel techniques to facilitate interactive exploratory data analysis and to construct interpretable models at scale.
- Time and place: 4:30--5:30 pm on Thursday, 25 February 2016, in Baker Hall A51
As always, the talk is free and open to the public.