column-store R or: how i learned to stop worrying and love monetdb
R-bloggers 2013-03-18
Summary:
(This article was first published on asdfree by anthony damico, and kindly contributed to R-bloggers) "Combining R's sophisticated calculations and MonetDB's excellent data access performance is a no-brainer. One gets the best of two (open source) worlds with minimal hassle." - Dr. Hannes Mühleisen"oh wow that was fast like a cheetah with a jetpack or something" - anthony damicowhy try monetdb + ra speed test of four analysis commands on sixty-seven million physician visit records using my personal laptop --# calculate the sum of a single variable..system.time( print( sum( carrier08$car_hcpcs_pmt_amt ) ) )[1] 3477564780 user system elapsed 0.00 0.00 0.04 seconds # ..or calculate the sum, mean, median, and count of a single variable with sqlsystem.time( dbGetQuery( db , 'select sum( car_hcpcs_pmt_amt ), avg( car_hcpcs_pmt_amt ), median( car_hcpcs_pmt_amt ), count(*) from carrier08' ) ) user system elapsed 0.01 0.00 16.86 seconds # calculate the same statistics, broken down by six age and two gender categoriessystem.time( dbGetQuery( db , 'select bene_sex_ident_cd, bene_age_cat_cd, sum( car_hcpcs_pmt_amt ), avg( car_hcpcs_pmt_amt ), median( car_hcpcs_pmt_amt ), count(*) from carrier08 group by bene_sex_ident_cd, bene_age_cat_cd' ) ) user system elapsed 0.00 0.01 36.03 seconds# calculate the same statistics, broken down by six age, two gender,# [...]