Assessing the Price of Solid State Harddrives
ggplot2 2013-03-15
Summary:
Over a staff meeting at work, the topic of price of solid state harddrives came up (what are they, is it non linear with size, etc.). Idecided to sample 120 solid state hard drives from newegg.com andrecorded their size (in GB) and price (in USD) as well as their class(SATA II or SATA III). Note that the sampling was semi-random, inthat I had no particular agenda, but did not go to great lengths tosample randomly. To look at this, I used ggplot2.
ssd <- read.csv("http://joshuawiley.com/files/ssd.csv") ssd$class <- factor(ssd$class) require(ggplot2) ## first pass p <- ggplot(ssd, aes(x = price, y = size, colour = class)) + geom_point() print(p)
Not too bad, but the data is sparser at higher sizes and prices, so wecan use a log-log scale to make it a little easier to see, and addlocally weighted regression (loess) lines to assess linearity (or lackthere of).
## add smooths and log to make clearer p <- p + stat_smooth(se=FALSE) + scale_x_log10(breaks = seq(0, 1000, 100)) + scale_y_log10(breaks = seq(0, 600, 100))
Okay, that is nice. Lastly, let’s add better labels, make the x-axistext not overlap, and include the intercept and slope parameters forthe linear lines of best fit for each class of hard drive.
## fit separate intercept and slope model m <- lm(size ~ 0 + class*price, data = ssd) est <- round(coef(m), 2) size2 <- paste0("II Size = ", est[1], " + ", est[3], "price") size3 <- paste0("III Size = ", est[2], " + ", est[4], "price") ## finalize p <- p + annotate("text", x = 100, y = 600, label = size2) + annotate("text", x = 100, y = 500, label = size3) + labs(x = "Price in USD", y = "Size in GB") + opts(title = "Log-Log Plot of SSD Size and Price", axis.text.x = theme_text(angle = 45, hjust = 1, vjust = 1))
(guest post by Joshua Wiley)