Boot

Statistical Modeling, Causal Inference, and Social Science 2013-06-03

Joshua Hartshorne writes:

I ran several large-N experiments (separate participants) and looked at performance against age. What we want to do is compare age-of-peak-performance across the different tasks (again, different participants).

We bootstrapped age-of-peak-performance. On each iteration, we sampled (with replacement) the X scores at each age, where X=num of participants at that age, and recorded the age at which performance peaked on that task. We then recorded the age at which performance was at peak and repeated. Once we had distributions of age-of-peak-performance, we used the means and SDs to calculate t-statistics to compare the results across different tasks. For graphical presentation, we used medians, interquartile ranges, and 95% confidence intervals (based on the distributions: the range within which 75% and 95% of the bootstrapped peaks appeared).

While a number of people we consulted with thought this made a lot of sense, one reviewer of the paper insists that this is no good, writing that “constructing the confidence intervals directly from the corresponding percentiles of the bootstrap distribution [has been] shown to be relatively poor in asymmetric cases” and that alternative methods have been developed. No citations are given.

My reply: I’m not really sure but here’s what I think. My instinct is that it would be better to fit a curve to each dataset rather than to just pick the age at which the raw data average is highest. You could, for example, fit a Gaussian process or even a lowess and find the age at which the fitted curve is maximized. I’m guessing that will be more accurate than taking the max of the raw data. Whatever data summary you use, though, getting standard errors via bootstrap seems reasonable to me. If it’s large N, and the solution is generally not on the boundary, I think you can use estimates and standard errors, shouldn’t need to bother with quantiles.

The post Boot appeared first on Statistical Modeling, Causal Inference, and Social Science.