Transparency in climate science « RealClimate

ab1630's bookmarks 2018-05-13

Summary:

"Good thing? Of course.* I was invited to give a short presentation to a committee at the National Academies last week on issues of reproducibility and replicability in climate science for a report they have been asked to prepare by Congress. My slides give a brief overview of the points I made, but basically the issue is not that there isn’t enough data being made available, but rather there is too much! A small selection of climate data sources is given on our (cleverly named) “Data Sources” page and these and others are enormously rich repositories of useful stuff that climate scientists and the interested public have been diving into for years. Claims that have persisted for decades that “data” aren’t available are mostly bogus (to save the commenters the trouble of angrily demanding it, here is a link for data from the original hockey stick paper. You’re welcome!). The issues worth talking about are however a little more subtle. First off, what definitions are being used here. This committee has decided that formally: Reproducibility is the ability to test a result using independent methods and alternate choices in data processing. This is akin to a different laboratory testing an experimental result or a different climate model showing the same phenomena etc. Replicability is the ability to check and rerun the analysis and get the same answer. [Note that these definitions are sometimes swapped in other discussions.] The two ideas are probably best described as checking the robustness of a result, or rerunning the analysis. Both are useful in different ways. Robustness is key if you want to make a case that any particular result is relevant to the real world (though that is necessary, not sufficient) and if a result is robust, there’s not much to be gained from rerunning the specifics of one person’s/one group’s analysis. For sure, rerunning the analysis is useful for checking the conclusions stemmed from the raw data, and is a great platform for subsequently testing its robustness (by making different choices for input data, analysis methods, etc.) as efficiently as possible. So what issues are worth talking about? First, the big success in climate science with respect to robustness/reproducibility is the Coupled Model Intercomparison Project – all of the climate models from labs across the world running the same basic experiments with an open data platform that makes it easy to compare and contrast many aspects of the simulations. However, this data set is growing very quickly and the tools to analyse it have not scaled as well. So, while everything is testable in theory, bandwidth and computational restrictions make it difficult to do so in practice. This could be improved with appropriate server-side analytics (which are promised this time around) and the organized archiving of intermediate and derived data. Analysis code sharing in a more organized way would also be useful...."

Date tagged:

05/13/2018, 14:35

Date published:

05/13/2018, 10:34

Transparency in climate science « RealClimate

ab1630's bookmarks 2018-05-13

Summary:

Link:

From feeds:

Tags:

Date tagged:

Date published: