"Taming the Parallel Effect Zoo" and the PLDI artifact evaluation process
composition.al 2014-06-07
My co-authors Aaron Todd, Sam Tobin-Hochstadt, Ryan Newton, and I have just finished up the camera-ready version of our new paper, “Taming the Parallel Effect Zoo: Extensible Deterministic Parallelism with LVish”, which will appear at PLDI 2014 in Edinburgh this June. In addition to having the paper accepted, we were happy to learn a few weeks ago that our submission passed the PLDI artifact evaluation process.
What this means is that we packaged up and distributed a software artifact that can be downloaded, compiled, run, and compared with the results that we presented in the paper, for the sake of repeatability.1 The claims we made in the paper were deemed repeatable enough that we passed artifact evaluation, which means that we get an extra page in our paper to describe the artifact, a special mention at the conference, and, most importantly, that we get to put a shiny “Artifact Evaluated” badge on our paper.
PLDI was careful to decouple artifact evaluation from paper acceptance or rejection. Only papers that were accepted to the conference were allowed to submit something to the artifact evaluation process, ensuring that there was no penalty for not submitting an artifact. (After all, for many papers, it may not make sense to submit an artifact.) We were told that out of roughly fifty accepted papers at PLDI, twenty submitted artifacts, and, of those, twelve passed artifact evaluation.
Lest I sound smug about having a paper that passed artifact evaluation, let me just say that I wasn’t at all sure that we were going to pass. For one thing, not all of the results in our paper were reproducible as part of the artifact. Some of them, for instance, require using software, such as Intel’s icc, that we weren’t allowed to repackage. Even for the benchmarks that could be run as part of the artifact, the artifact reviewers weren’t always able to get results comparable to ours. However, in each case the discrepancy could (apparently) be explained by the fact that the reviewer had a different hardware configuration, or was running the code in a VM, or suchlike, and the committee’s conclusion was that the artifact was good enough to pass evaluation.
We tried using the Docker container-creation tool to make a “shipping container” for our artifact, usable by anyone with the Docker client installed; this made it a lot easier to distribute the artifact. The reviewers generally liked this method of distribution. Unfortunately, Docker only runs on Linux, which caused trouble for one reviewer, and another reviewer had trouble with Docker requiring root access, which they didn’t have. As a backup, we also provided instructions for building the artifact without Docker, and happily, none of the reviewers reported a problem with compiling and running the artifact that way. Whew!
Although we’ve made our artifact available for whoever might be interested, perhaps more interesting is the LVish Haskell library, which we hope people will make use of in their own projects. As of this writing, the most recent release of LVish is 1.1.2, which unfortunately still lags behind the version described in the paper. We’re still trying to fix some bugs, but if all goes well, we’re hoping to release LVish 2.0 by the time we actually present the paper at PLDI in June. In the meantime, the code is on GitHub, in a branch called “2.0”, if you’re curious. We’ve also updated our lvar-examples repository to include examples that run against both LVish 1.1.2 and LVish 2.0.
Thanks to my co-authors for caring about repeatable research and the many long hours they spent working on the artifact and on LVish — none of this would have been possible without their help.
I would have said “reproducibility”, but Sam, who knows everything, points out that reproducibility and repeatability are different things, and that what we are demonstrating here is mere repeatability. I’m not entirely sure I understand what reproducibility would mean for us, but repeatability seems to be a prerequisite for it, and so it seems to make sense to aim for repeatability as a start.↩