Improved R Profiling Summaries
R-bloggers 2013-05-02
Summary:
In my last post I mentioned that I had improved on R’s summaryRprof() function with a custom function called proftable(). I’ve updated proftable() to take advantage of R 3.0.0’s ability to record line numbers while profiling. I’ve put it on github – you can get it there or below.
proftable reads in a file generated by Rprof() and creates an easy-to read table of the most time-consuming calls in your code, ordered from most time-consuming to least. Unlike summaryRprof(), it prints the entire call stack, so you can trace the origin of the time hogs. To make this easier to read, I lop off the “parent stack” common to all of the function calls, and display it just once, below the table. Here’s some example output:
> Rprof("profile1.out", line.profiling=TRUE)
> source("http://pastebin.com/download.php?i=KjdkSVZq")
> Rprof(NULL)
> proftable("profile1.out", lines=10)
PctTime Call
20.47 1#17 > [ > 1#17 > [.data.frame
9.73 1#17 > [ > 1#17 > [.data.frame > [ > [.factor
8.72 1#17 > [ > 1#17 > [.data.frame > [ > [.factor > NextMethod
8.39 == > Ops.factor
5.37 ==
5.03 == > Ops.factor > noNA.levels > levels
4.70 == > Ops.factor > NextMethod
4.03 1#17 > [ > 1#17 > [.data.frame > [ > [.factor > levels
4.03 1#17 > [ > 1#17 > [.data.frame > dim
3.36 1#17 > [ > 1#17 > [.data.frame > length
#File 1: http://pastebin.com/download.php?i=KjdkSVZq
Parent Call: source > withVisible > eval > eval >
Total Time: 5.96 seconds
Percent of run time represented: 73.8 %
Note that the “Parent Call” at the bottom shows some functions which RStudio wrapped my code in. Also, I chose only to display the top 10 time-consuming calls, but proftable told me that those 10 calls represent 73.8% of the run time. I find this display a lot more intuitive than summaryRprof()
Here’s the whole function. If you have improvements, fork it on github:
proftable <- function(file, lines=10) {
# require(plyr)
interval <- as.numeric(strsplit(readLines(file, 1), "=")[[1L]][2L])/1e+06
profdata <- read.table(file, header=FALSE, sep=" ", comment.char = "",
colClasses="character", skip=1, fill=TRUE,
na.strings="")
filelines <- grep("#File", profdata[,1])
files <- aaply(as.matrix(profdata[filelines,]), 1, function(x) {
paste(na.omit(x), collapse = " ") })
profdata <- profdata[-filelines,]
total.time <- interval*nrow(profdata)
profdata <- as.matrix(profdata[,ncol(profdata):1])
profdata <- aaply(profdata, 1, function(x) {
c(x[(sum(is.na(x))+1):length(x)],
x[seq(from=1,by=1,length=sum(is.na(x)))])
})
s