[R] data.table’s frank()

R-bloggers 2025-04-12

[This article was first published on R on Zhenguo Zhang's Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Zhenguo Zhang’s Blog /2025/04/12/r-data-table-s-frank/ –

knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)library(knitr)library(data.table)

One can use data.table::frank() to rank the rows of a data.table or simplya vector. Compared to the base R function rank(), frank() is faster. TodayI will show how to use this function.

First, let’s generate a example data.table with 10 rows and 3 columns,for simplicity, we will make first 2 columns are integer and the last one isa character. Also, we will duplicate some values to show how tied values aresorted:

set.seed(123)n <- 10dt <- data.table(  a = sample(1:10, n, replace = TRUE),  b = sample(1:10, n, replace = TRUE),  c = sample(letters[1:5], n, replace = TRUE))kable(dt, caption = "Example data.table")

Table 1: Example data.tableabc35a33c109d29a69c53e48d610b97e1010a

First, let’s see how to use frank() to rank the whole data.table.

dt[, rank := frank(.SD)]kable(dt[order(rank)], caption = "Ranked data.table")

Table 2: Ranked data.tableabcrank29a133c235a348d453e569c6610b797e8109d91010a10

As you can see, the frank() function ranks the rows of the data.tableby first checking the first column, then the second column, and finally the third column.

One can also sort a data.table based on selected columns, for example,let’s use the 2nd and 3rd columns to rank the data.table. But for this,one need to use its variant frankv():

dt[, rank := frankv(.SD, cols = c("b","c"))]kable(dt[order(rank)], caption = "Ranked data.table by 2nd and 3rd columns")

Table 3: Ranked data.table by 2nd and 3rd columnsabcrank33c153e235a397e448d529a669c7109d81010a9610b10

Finally, we would like to talk about the ties.method argument. To make it simple,we will simiply use the 2nd column to rank the table so you can see the effect ofthe ties.method argument.

newDT <- dt[, .(b)]newDT[, rankAverage := frank(b, ties.method = "average")] # the defaultnewDT[, rankFirst := frank(b, ties.method = "first")]newDT[, rankLast := frank(b, ties.method = "last")]newDT[, rankRandom := frank(b, ties.method = "random")]newDT[, rankMax := frank(b, ties.method = "max")]newDT[, rankMin := frank(b, ties.method = "min")]newDT[, rankDense := frank(b, ties.method = "dense")]kable(newDT[order(b)], caption = "Ranked data.table by 2nd column")

Table 4: Ranked data.table by 2nd columnbrankAveragerankFirstrankLastrankRandomrankMaxrankMinrankDense31.512221131.521121153.033333274.044444385.055555497.068786597.077886597.0866865109.591091096109.5109101096

As you can see, here are how the ties.method argument works:

average: the average of the ranks of the tied values
first: the order in which the values appear in the data
last: the order in which the values appear in the data
random: a random order for the ties
max: the maximum rank of the tied values
min: the minimum rank of the tied values
dense: the values in a tie set get the same rank, and the rankvalue increases by 1 when moving to the next tie set. This isa unique feature of frank() and is not available in the base R.

When one wants to use the rank to choose top N rows, it is importantto know how the rank is computed; in this case, you may want toavoid the ties.method values: max, min, and dense.

Happy programming

- /2025/04/12/r-data-table-s-frank/ -

To leave a comment for the author, please follow the link and comment on their blog: R on Zhenguo Zhang's Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue reading: [R] data.table’s frank()