R Tip: Use match_order() to Align Data
Win-Vector Blog 2018-04-10
R tip. Use wrapr::match_order()
to align data.
Suppose we have data in two data frames, and both of these data frames have common row-identifying columns called “idx
“.
library("wrapr") d1 <- build_frame( "idx", "x" | 3 , "a" | 1 , "b" | 2 , "c" ) d2 <- build_frame( "idx", "y" | 2 , "D" | 1 , "E" | 3 , "F" ) print(d1) #> idx x #> 1 3 a #> 2 1 b #> 3 2 c print(d2) #> idx y #> 1 2 D #> 2 1 E #> 3 3 F
(Please see R Tip: Think in Terms of Values for build_frame()
and other value capturing tools.)
Often we wish to work with such data aligned so each row in d2
has the same idx
value as the same row (by row order) as d1
. This is an important data wrangling task, so there are many ways to achieve it in R, such as base::merge()
, dplyr::left_join()
, or by sorting both tables into the same order and then using base::cbind()
.
However if you wish to preserve the order of the first table (which may not be sorted), you need one more trick.
You can add a row-id column, sort by the joining id, combine and then re-sort by the row-id column.
Or you can match the orders in one step using wrapr::match_order()
.
p <- match_order(d2$idx, d1$idx) print(d2[p, , drop=FALSE]) #> idx y #> 3 3 F #> 2 1 E #> 1 2 D
match_order
is merely wrapping all of the sort and re-sort tricks we mentioned above, however the theory is based on the absolute magic of associative array indexing.
Please see R Tip: Use drop = FALSE
with data.frame
s, for why one should get in the habit of writing drop = FALSE
.