Improved R Profiling Summaries

R-bloggers 2013-05-02

Summary:

In my last post I mentioned that I had improved on R’s summaryRprof() function with a custom function called proftable(). I’ve updated proftable() to take advantage of R 3.0.0’s ability to record line numbers while profiling. I’ve put it on github – you can get it there or below.

proftable reads in a file generated by Rprof() and creates an easy-to read table of the most time-consuming calls in your code, ordered from most time-consuming to least. Unlike summaryRprof(), it prints the entire call stack, so you can trace the origin of the time hogs. To make this easier to read, I lop off the “parent stack” common to all of the function calls, and display it just once, below the table. Here’s some example output:

> Rprof("profile1.out", line.profiling=TRUE)
> source("http://pastebin.com/download.php?i=KjdkSVZq")
> Rprof(NULL)
> proftable("profile1.out", lines=10)

 PctTime Call                                                      
 20.47   1#17 > [ > 1#17 > [.data.frame                            
  9.73   1#17 > [ > 1#17 > [.data.frame > [ > [.factor             
  8.72   1#17 > [ > 1#17 > [.data.frame > [ > [.factor > NextMethod
  8.39   == > Ops.factor                                           
  5.37   ==                                                        
  5.03   == > Ops.factor > noNA.levels > levels                    
  4.70   == > Ops.factor > NextMethod                              
  4.03   1#17 > [ > 1#17 > [.data.frame > [ > [.factor > levels    
  4.03   1#17 > [ > 1#17 > [.data.frame > dim                      
  3.36   1#17 > [ > 1#17 > [.data.frame > length                   

#File 1: http://pastebin.com/download.php?i=KjdkSVZq

Parent Call: source > withVisible > eval > eval >

Total Time: 5.96 seconds
Percent of run time represented: 73.8 %

Note that the “Parent Call” at the bottom shows some functions which RStudio wrapped my code in. Also, I chose only to display the top 10 time-consuming calls, but proftable told me that those 10 calls represent 73.8% of the run time. I find this display a lot more intuitive than summaryRprof()

Here’s the whole function. If you have improvements, fork it on github:

proftable <- function(file, lines=10) {
# require(plyr)
  interval <- as.numeric(strsplit(readLines(file, 1), "=")[[1L]][2L])/1e+06
  profdata <- read.table(file, header=FALSE, sep=" ", comment.char = "",
                         colClasses="character", skip=1, fill=TRUE,
                         na.strings="")
  filelines <- grep("#File", profdata[,1])
  files <- aaply(as.matrix(profdata[filelines,]), 1, function(x) {
                        paste(na.omit(x), collapse = " ") })
  profdata <- profdata[-filelines,]
  total.time <- interval*nrow(profdata)
  profdata <- as.matrix(profdata[,ncol(profdata):1])
  profdata <- aaply(profdata, 1, function(x) {
                      c(x[(sum(is.na(x))+1):length(x)],
                        x[seq(from=1,by=1,length=sum(is.na(x)))])
              })
  s

Link:

http://feedproxy.google.com/~r/RBloggers/~3/2MEGInukOQg/

From feeds:

Statistics and Visualization » R-bloggers

Tags:

Authors:

Noam Ross

Date tagged:

05/02/2013, 23:41

Date published:

05/02/2013, 19:48