Optimal Meeting Point on the Paris Metro
R-bloggers 2013-03-20
Summary:
tl;dr: Play with the app here
When you live in Paris, chances are you are (home or work) very close to a metro station, so when you want to meet with some friends, you usually end up picking another metro station as a meeting point. Yet, finding the optimal place to meet can easily become a complex problem considering the dense network we have. Now that the RATP (the public transport operator in Paris) had made some of their datasets available, this sounds like a good job to be solved with R and Shiny.
In the spirit of the current open data movement, the RATP has made available a number of datasets under the Etalab license, and among them, two are of a particular interest for us:
- “Correspondances stations/lignes sur le réseau RATP”, which gives us all the stops (metro, RER, buses, and tram, but we will stick with the metro here…), along with the lines that they are associated with, in the format “station ID, line, station type”
- “Positions géographiques des stations du réseau RATP”, which gives us the geographical positions of all stations, in the format “station ID, longitude, latitude, station name, city, station type” The following lines of code allows you to import and readily use the datasets:
arret_ligne <- read.csv("ratp_arret_ligne.csv", header=F, sep="#",
col.names=c("ID","Ligne","Type"),
stringsAsFactors=F)
arret_positions <- read.csv("ratp_arret_graphique.csv", header=F, sep="#",
col.names=c("ID","X","Y","Nom","Ville","Type"),
stringsAsFactors=F)
To state our problem more clearly, we are given initially a set of n metro stops among all N possible, and we want to find S the optimal stop where to meet. A first step will involve computing the distances among all metro stops (shortest path, preferably on a time scale rather than a space scale!), and the second step is to find some kind of “barycenter” of these n stops. For these purposes, we model our metro network as a graph. The shortest path among two stops can be found using the very common Dijkstra algorithm, while defining the “barycenter” can be a bit cumbersome. Using a geographic barycenter doesn’t make any sense (we might end up in a place with no stop, or even with the closest stop being physically far away from a duration perspective). The next thing could be to think of this problem as finding the centroid of the cluster formed by our n stops, using something in the spirit of k-means (which doesn’t need actual points in space but only distances), and mapping this centroid to our larger network, but empirically the results didn’t look sound. No point in complicating things, another way to think of this is merely as a minimax problem: finding the stop which minimizes the maximum distance from each of the n initial stops to S. And this is actually very easy to implement in R!
There are two technical problems worth highlighting:
- First, the RATP doesn’t provide us with the dataset for the actual network, only a mapping from stations to lines. This means we don’t know the actual ordering of the stations on the line, and this is actually not a simple 1:N mapping since we have forks and isolated circles (think of line 13 after “La Fourche”, and line 7bis where there is no real terminus but a circle at the end of the line).
- Second, the RATP doesn’t provide us with the dataset for the transportation time between any two stops, as this might be a great competitive advantage they have for gathering metro users to their website and app. I have no reference for this, I’m only guessing! The solutions used here for these two problems are as follow:
- We boldly assume that stop A is connected to