Exploring a 3-D Synthetic Dataset
R-bloggers 2025-04-12
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exploring the HistData package
Over on BlueSky, I have been working through a few challenges. For the months of February and March, I participated in the DuBois Challenge, where you take a week to recreate some of the powerful visualizations that came out of the Paris Exposition from W.E.B. Du Bois. My work there, complete with code, can be found in my github
Inspired by this, I’ve also been doing the #30DayChartChallenge, where you make a chart a day on a theme that changes each day. I have taken this as an opportunity to explore Michael Friendly’s HistData package, which draws from his excellent book with Howard Wainer. I have done posts on John Snow, the Trial of the Pyx, Florence Nightingale, and others on my github. However, one dataset that a simple plot doesn’t do justice to is the Pollen dataset. This dataset, like mtcars
and flights
, are synthetic datasets that were used as data challenges (the other two are now basic datasets for reprexes as well).
This dataset, however, shows the power of plotly
.
library(tidyverse) library(HistData) library(plotly) data("Pollen") head(Pollen)
# A tibble: 6 × 5 ridge nub crack weight density <dbl> <dbl> <dbl> <dbl> <dbl> 1 -2.35 3.63 5.03 10.9 -1.39 2 -1.15 1.48 3.24 -0.594 2.12 3 -2.52 -6.86 -2.80 8.46 -3.41 4 5.75 -6.51 -5.15 4.35 -10.3 5 8.75 -3.90 -1.38 -14.9 -2.42 6 10.4 -3.16 12.8 -14.9 -6.49
The first three variables are meant to be plotted on the x, y and z axis, where the other variables are meant to describe the grains of pollen. Doing a quick correlation shows that there is at least one strong correlation that can be seen through the use of color, where weight is highly correlated with the x-axis.
res <- cor(Pollen) round(res,2)
ridge nub crack weight density ridge 1.00 0.13 -0.13 -0.90 -0.57 nub 0.13 1.00 0.08 -0.17 0.33 crack -0.13 0.08 1.00 0.27 -0.15 weight -0.90 -0.17 0.27 1.00 0.24 density -0.57 0.33 -0.15 0.24 1.00
However, when you plot the dataset, something else shows up. Thankfully, plotly
allows you to drag a plot around to explore it.
plot_ly(Pollen, x = ~ridge, y = ~nub, z = ~crack) |> add_markers(color = ~weight, size=2) |> layout(title="David Coleman's Synthetic Pollen Dataset")|> config(displayModeBar=FALSE)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.