R Tip: Use match_order() to Align Data

Win-Vector Blog 2018-04-10

R tip. Use wrapr::match_order() to align data.

Suppose we have data in two data frames, and both of these data frames have common row-identifying columns called “idx“.

library("wrapr")

d1 <- build_frame(
   "idx", "x" |
   3    , "a" |
   1    , "b" |
   2    , "c" )

d2 <- build_frame(
   "idx", "y" |
   2    , "D" |
   1    , "E" |
   3    , "F" )

print(d1)
#>   idx x
#> 1   3 a
#> 2   1 b
#> 3   2 c

print(d2)
#>   idx y
#> 1   2 D
#> 2   1 E
#> 3   3 F

(Please see R Tip: Think in Terms of Values for build_frame() and other value capturing tools.)

Often we wish to work with such data aligned so each row in d2 has the same idx value as the same row (by row order) as d1. This is an important data wrangling task, so there are many ways to achieve it in R, such as base::merge(), dplyr::left_join(), or by sorting both tables into the same order and then using base::cbind().

However if you wish to preserve the order of the first table (which may not be sorted), you need one more trick.

You can add a row-id column, sort by the joining id, combine and then re-sort by the row-id column.

Or you can match the orders in one step using wrapr::match_order().

p <- match_order(d2$idx, d1$idx)

print(d2[p, , drop=FALSE])
#>   idx y
#> 3   3 F
#> 2   1 E
#> 1   2 D

match_order is merely wrapping all of the sort and re-sort tricks we mentioned above, however the theory is based on the absolute magic of associative array indexing.

Please see R Tip: Use drop = FALSE with data.frames, for why one should get in the habit of writing drop = FALSE.