Tables Are Like Cockroaches

R-bloggers 2013-04-03

(This article was first published on Timely Portfolio, and kindly contributed to R-bloggers)

As much as I would like to completely replace all tables with beautiful, intuitive, and interactive charts, tables like cockroaches cannot be eliminated. Based on this very interesting discussion on the Perceptual Edge forum with source Exploring the Origins of Tables for Information Visualization, tables date back to 1850 BCE. The paper concludes with

As part of exploration, tables help answer questions about data. As exemplars of communication, tables provide effective means for presenting data - each table has a story or stories to tell.

After struggling to create some attractive tables in HTML with R, I'm not sure they are any easier to create almost 4,000 years later. LaTeX is the clear winner when it comes to the table making competition. I have used xtable for HTML tables, but it could not fully produce a complicated table.  I was delighted to recently find the Gmisc package, which is the result of a frustrated orthopaedic surgeon's need to create tables in Word for journal submission.

pretty table from gforge.se

Another R to Word workflow was also discussed in Writing a MS-Word document using R (with as little overhead as possible). I was not aware of the need to produce a Word .doc from R. I simply thought creating an html table should not be that hard.

If you read Old Price Tables in Modern d3 Visualization and Dust off 130 Year Old Gold Books on Google Bookshelf, you'll know that my new favorite book is Gold and Prices Since 1873 by James Laurence Laughlin. This table on page 26 seems easy enough to recreate.

I had no idea how recreating this table would test and enhance my R skills. I started by manually entering the data since OCR did not work.

reps = c("http://ftp.sunet.se/pub/lang/CRAN", "http://cran.gforge.se") install.packages("Gmisc", repos = reps, dependencies = TRUE) 
library(Gmisc, verbose = FALSE)  # manually enter the data in a data frame data1874 <- data.frame(c(1872, 1874, 1869, 1870, 1870, 1871, 1871, 1871, 1870,      1873, 1872, 1871), c(153825, 41380, 131800, 15447, 4893, 2109, 16651, 80361,      1749, 7058, 3801, 18900), c(0, 0, 106600, 33695, 14230, 55320, 37160, 4775,      4325, 1535, 6980, 0), c(198540, 20580, 274100, 88487, 40505, 62857, 119000,      429486, 7327, 11794, 16877, 284561))  rownames(data1874) <- c("Banks of the United Kingdom", "Banks of Australia",      "Banks of France", "Banks of Italy", "National Bank of Belgium", "Bank of the Netherlands",      "Bank of Austria-Hungary", "Imperial State Bank of Russia", "Imperial Bank of Sweden",      "Bank of Norway", "National Bank of Denmark", "National Bank of the United States") colnames(data1874) <- c(" ", "Gold", "Silver", "Total Note Circulation")  data1885 <- data.frame(c(141205, 65890, 231483, 56121, 13900, 19161, 25902,      102207, 3436, 7169, 11566, 158100), c(0, 0, 217087, 11203, 6540, 38366,      48646, 676, 777, 0, 846, 7900), c(186850, 28115, 583610, 189690, 73400,      76972, 136351, 429860, 9835, 9287, 18370, 276500)) colnames(data1885) <- colnames(data1874)[2:4]   # get sums for totals row in table data1874[NROW(data1874) + 1, ] = apply(data1874, MARGIN = 2, FUN = sum) data1885[NROW(data1885) + 1, ] = apply(data1885, MARGIN = 2, FUN = sum) # add Total to row names rownames(data1874)[NROW(data1874)] = "Total" rownames(data1885)[NROW(data1885)] = "Total" # eliminate the sum of years which does not make sense data1874[NROW(data1874), 1] = ""   # get commas in the numbers data1874[, 2:4] <- format(data1874[, 2:4], big.mark = ",") data1885 <- format(data1885, big.mark = ",") 

Then with gmisc I very quickly achieved a decent table.

# use htmlTable to produce a table htmlTable(cbind(data1874, data1885), caption = "", rowlabel = "", cgroup = c("Reserves",      "", "Reserves", ""), n.cgroup = c(3, 1, 2, 1, 0), ctable = TRUE, output = TRUE) 
Reserves   Reserves   GoldSilver Total Note Circulation GoldSilver Total Note CirculationBanks of the United Kingdom1872153,8250 198,540 141,2050 186,850Banks of Australia187441,3800 20,580 65,8900 28,115Banks of France1869131,800106,600 274,100 231,483217,087 583,610Banks of Italy187015,44733,695 88,487 56,12111,203 189,690National Bank of Belgium18704,89314,230 40,505 13,9006,540 73,400Bank of the Netherlands18712,10955,320 62,857 19,16138,366 76,972Bank of Austria-Hungary187116,65137,160 119,000 25,90248,646 136,351Imperial State Bank of Russia187180,3614,775 429,486 102,207676 429,860Imperial Bank of Sweden18701,7494,325 7,327 3,436777 9,835Bank of Norway18737,0581,535 11,794 7,1690 9,287National Bank of Denmark18723,8016,980 16,877 11,566846 18,370National Bank of the United States187118,9000 284,561 158,1007,900 276,500Total477,974264,620 1,554,114 836,140332,041 2,018,840

 

However, the complicated multiple row heading was still missing, so here is the much harder brute force work to parse the HTML to get the table structured correctly.

# do all the hard work to make the table more of an exact replica gtable_table <- htmlTable(cbind(data1874, data1885), caption = "", rowlabel = "",      cgroup = c("Reserves", "", "Reserves", ""), n.cgroup = c(3, 1, 2, 1, 0),      ctable = TRUE, output = FALSE)  require(XML) # parse the table so that we can access the elements in a very crude # manner doc <- htmlParse(gtable_table) # add another row heading to the table with XML temp <- addChildren(getNodeSet(doc, "//thead")[[1]], newXMLNode("tr", list(newXMLNode("th",      attrs = list(colspan = "1", style = "font-weight: 900; border-top: 2px solid grey;border-right: 1px solid grey;"),      text = ""), newXMLNode("th", attrs = list(colspan = "6", style = "font-weight: 900; border-top: 2px solid grey;"),      text = "1870-1874"), newXMLNode("th", attrs = list(colspan = "5", style = "font-weight: 900; border-top: 2px solid grey;"),      text = "1885"))), at = 0)    # add some vertical borders; wish this were easier but very manual th <- getNodeSet(doc, "//thead//th")  #start with the th elements in thead for (i in c(1, 2, 4, 8, 12, 18)) {     oldstyle <- xmlAttrs(th[[i]])["style"]  #get the old style attribute     removeAttributes(th[[i]], attrs = "style")  #remove the style attribute     addAttributes(th[[i]], style = paste(oldstyle, "border-right: 1px solid grey;",          sep = ""))  #add the old style attribute concatenated with border-right }  th <- getNodeSet(doc, "//tbody//td")  #now do the td elements in tbody for (i in c(seq(1, 133, by = 11), seq(7, 139, by = 11))) {     oldstyle <- xmlAttrs(th[[i]])["style"]  #get the old style attribute     removeAttributes(th[[i]], attrs = "style")  #remove the style attribute     addAttributes(th[[i]], style = paste(oldstyle, "border-right: 1px solid grey;",          sep = ""))  #add the old style attribute concatenated with border-right }  # although htmlTable will group rows, I could not make it do what I wanted # so add underline before the total row for (i in 133:143) {     oldstyle <- xmlAttrs(th[[i]])["style"]  #get the old style attribute     removeAttributes(th[[i]], attrs = "style")  #remove the style attribute     addAttributes(th[[i]], style = paste(oldstyle, "border-top: 1px solid grey;",          sep = ""))  #add the old style attribute concatenated with border-top }   # for some reason &nbsp; becomes Â, so reverse it back to $nbsp; returnHTML <- gsub("[Â].", replacement = "&nbsp;", saveXML(getNodeSet(doc, "//table")[[1]]))  # not sure if necessary but free up doc from memory free(doc)  cat(returnHTML) 
1870-18741885Reserves   Reserves   GoldSilver Total Note Circulation GoldSilver Total Note CirculationBanks of the United Kingdom1872153,8250 198,540 141,2050 186,850Banks of Australia187441,3800 20,580 65,8900 28,115Banks of France1869131,800106,600 274,100 231,483217,087 583,610Banks of Italy187015,44733,695 88,487 56,12111,203 189,690National Bank of Belgium18704,89314,230 40,505 13,9006,540 73,400Bank of the Netherlands18712,10955,320 62,857 19,16138,366 76,972Bank of Austria-Hungary187116,65137,160 119,000 25,90248,646 136,351Imperial State Bank of Russia187180,3614,775 429,486 102,207676 429,860Imperial Bank of Sweden18701,7494,325 7,327 3,436777 9,835Bank of Norway18737,0581,535 11,794 7,1690 9,287National Bank of Denmark18723,8016,980 16,877 11,566846 18,370National Bank of the United States187118,9000 284,561 158,1007,900 276,500Total477,974264,620 1,554,114 836,140332,041 2,018,840

 

This can be even further improved with some simple CSS.  I hope this helps somebody.  Oh, I just also remembered dprint which I will revisit soon. For now I think I'll go back to making graphs.

Gist Source:

To leave a comment for the author, please follow the link and comment on his blog: Timely Portfolio.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series,ecdf, trading) and more...