“Responsibility for Raw Data”: “Failure to retain data for some reasonable length of time following publication would produce notoriety equal to the notoriety attained by publishing inaccurate results. A possibly more effective means of controlling quality of publication would be to institute a system of quality control whereby random samples of raw data from submitted journal articles would be requested by editors and scrutinized for accuracy and the appropriateness of the analysis performed.”

Statistical Modeling, Causal Inference, and Social Science 2024-10-09

Leroy Wolins writes:

Last spring a graduate student at Iowa State University required data of a particular kind in order to carry out a study for his master’s thesis. In order to obtain these data he wrote to 37 authors whose journal articles appeared in APA journals between 1959 and 1961. Of these authors, 32 replied. Twenty-one of these reported the data misplaced, lost, or inadvertently destroyed. Two of the remaining 11 offered their data on the conditions that they be notified of our intended use of their data, and stated that they have control of anything that we would publish involving these data. We met the former condition but refused the latter for those two authors since we felt the raw data from published research should be made public upon request when possible and economically feasible. Thus raw data from 9 authors were obtained. From these 9 authors, 11 analyses were obtained. Four of these were not analyzed by us since they were made available several months after our request. Of the remaining 7 studies, 3 involved gross errors. One involved an analysis of variance on transformed data where the transformation was clearly inappropriate. Another analysis contained a gross computational error so that several F ratios near one were reported to be highly significant. The third analysis incorrectly reported insignificant results due to the use of an inappropriate error term.

We have a dilemma. In one way it does not seem fair to report these errors, or in some way cause them to be reported, for those authors who behaved in the best interest of science by retaining their data and submitting them to us; whereas the authors who innocently (?) lost their data, misplaced their data, etc., go free from criticism. On the other hand one might argue that a scientist should report errors when he finds them.

We completely accept the responsibility of dealing with the present dilemma but we wish to share with the membership of the American Psychological Association responsibility for dealing with the conditions that produced it. If it were clearly set forth by the APA that the responsibility for retaining raw data and submitting them for scrutiny upon request lies with the author, this dilemma would not exist. Failure to retain data for some reasonable length of time following publication would produce notoriety equal to the notoriety attained by publishing inaccurate results. A possibly more effective means of controlling quality of publication would be to institute a system of quality control whereby random samples of raw data from submitted journal articles would be requested by editors and scrutinized for accuracy and the appropriateness of the analysis performed.

The above was published as a letter in the journal American Psychologist . . . in 1962!

Antony Unwin sent me the link and wrote:

It is an early example of checking published studies, nice and short, and should be better known. In those days it was just a letter. Nowadays it would probably be a full paper with all sorts of references and cross-checking. Things have changed since, how much have they improved?