R Tip: How To Look Up Matrix Values Quickly
Win-Vector Blog 2020-03-30
R is a powerful data science language because, like Matlab, numpy, and Pandas, it exposes vectorized operations. That is, a user can perform operations on hundreds (or even billions) of cells by merely specifying the operation on the column or vector of values.
Of course, sometimes it takes a while to figure out how to do this. Please read for a great R matrix lookup problem and solution.
In R we can specify operations over vectors. For arithmetic this is easy, but some more complex operations you “need to know the trick.”
Patrick Freeman (@PTFreeman) recently asked: what is the idiomatic way to look up a bunch of values from a matrix by row and column keys? This is actually easy to do if we first expand the data matrix into RDF-triples. If our data were in this format we could merge/join it against our desired column indices.
Let’s start with an example data matrix.
# example matrix data
m <- matrix(1:9, nrow = 3)
row.names(m) <- c('R1' ,'R2', 'R3')
colnames(m) <- c('C1', 'C2', 'C3')
knitr::kable(m)
C1
C2
C3
R1
1
4
7
R2
2
5
8
R3
3
6
9
And our data-frame containing the indices we want to look-up.
# row/columns we want
w <- data.frame(
i = c('R1', 'R2', 'R2'),
j = c('C2', 'C3', 'C2'))
knitr::kable(w)
i
j
R1
C2
R2
C3
R2
C2
That is: we want to know the matrix values from [R1, C2], [R2, C3], and [R2, C2].
The trick is: how do we convert the matrix into triples? digEmAll, has a great solution to that here.
# unpack into 3-column format from:
# https://stackoverflow.com/a/9913601
triples <- data.frame(
i = rep(row.names(m), ncol(m)),
j = rep(colnames(m), each = nrow(m)),
v = as.vector(m))
knitr::kable(triples)
i
j
v
R1
C1
1
R2
C1
2
R3
C1
3
R1
C2
4
R2
C2
5
R3
C2
6
R1
C3
7
R2
C3
8
R3
C3
9
What the above code has done is: write each entry of the original matrix as a separate row with the original row and column ids landed as new columns. This data format is very useful.
The above code is worth saving as a re-usable snippet, as getting it right is a clever step.
Now we can vectorize our lookup using the merge command, which produces a new joined table where the desired values have been landed as a new column.
res <- merge(w, triples, by = c('i', 'j'))
knitr::kable(res)
i
j
v
R1
C2
4
R2
C2
5
R2
C3
8
And that is it: we have used vectorized and relational concepts to look up many values from a matrix very quickly.