Why I don’t “really” practice open scienc

abernard102@gmail.com 2012-08-20


“I'm a pretty big advocate of anything open -- open source, open access, and open science, in particular. I always have been. And now that I'm a professor, I've been trying to figure out how to actually practice open science effectively What is open science? Well, I think of it as talking regularly about my unpublished research on the Internet, generally in my blog or on some other persistent, explicitly public forum. It should be done regularly, and it should be done with a certain amount of depth or self-reflection. Most of my cool, sexy bloggable work is in bioinformatics; I do have a wet lab, and we're starting to get some neat stuff out of that (incl. both some ascidian evo-devo and some chick transcriptomics) but that's not as mature as the computational stuff I'm doing. And, as you know if you've seen any of my recent posts on this, I'm pretty bullish about the computational work we've been doing: the de novo assembly sequence foo is, frankly, super awesome and seems to solve most of the scaling problems we face in short-read assembly. And it provides a path to solving the problems that it doesn't outright solve. (I'm talking about partitioning and digital normalization.) While I think we're doing awesome work, I've been uncharacteristically (for me) shy about proselytizing it prior to having papers ready. I occasionally reference it on mailing lists, or in blog posts, or on twitter, but the most I've talked about the details has been in talks -- and I've rarely posted those talks online. When I have, I don't point out the nifty awesomeness in the talks, either, which of course means it goes mostly unnoticed. This seems to be at odds with my oft-loudly stated position that open-everything is the way to go. What's going on?? That's what this blog post is about. I think it sheds some interesting light on how science is actually practiced, and why completely open science might waste a lot of people's time. I'd like to dedicate this blog post to Greg Wilson. He and I chat irregularly about research, and he's always seemed interested in what I'm doing but is stymied because I don't talk about it much in public venues. And he's been a bit curious about why. Which made me curious about why. Which led to this blog post, explaining why I think why... [1] I was really freakin' busy actually getting the stuff to work, not to mention teaching, traveling, and every now and then actually being at home. I was definitely worried about ‘theft’ of ideas. Looking back, this seems a mite ridiculous, but: I'm junior faculty in a fast-moving field. Eeek! I also have a duty to my grads and postdocs to get them published, which wouldn't be helped by being ‘scooped’. [2] We kept on coming up with new solutions and approaches! Digital normalization didn't exist until August 2011, for example; appropriate de-suckifying of Illumina data took until April or May of 2011; and proving that it all worked was, frankly, quite tough and took until October. (More on this below.) [3] The code wasn't ready to use, and we hadn't worked out all the right parameters, and I wasn't ready to do the support necessary to address lots of people using the software. All of these things meant I didn't talk about things openly on my blog. Is this me falling short of ‘open science’ ideals?? In my defense, on the ‘open science’ side: [1] I gave plenty of invited talks in this period, including a few (one at JGI and one at UMD CBCB) attended by experts who certainly understood everything I was saying, probably better than me. [2] I posted some of these talks on slideshare. [3] all of our software development has been done on github, under github.com/ctb/khmer/. It's all open source, available, etc. ...but these are sad excuses for open science. None of these activities really disseminated my research openly. Why? Well, invited talks by junior faculty like me are largely attended out of curiosity and habit, rather than out of a burning desire to understand what they're doing; odds are, the faculty in question hasn't done anything particularly neat, because if they had, they'd be well known/senior, right? And who the heck goes through other people's random presentations on slideshare? So that's not really dissemination, especially when the talks are given to an in group already. What about the source code? The ‘but all my source code is available’ dodge is particularly pernicious. Nobody, but nobody, looks at other people's source code in science, unless it's (a) been released, (b) been documented, and (c) claims to solve YOUR EXACT ACTUAL PROBLEM RIGHT NOW RIGHT NOW. The idea that someone is going to come along and swoop your awesome solution out of your repository seems to me to be ridiculous; you'd be lucky to be that relevant, frankly... What do I think is sufficient for dissemination? In my case, how do you build solutions and write software that actually has an impact, either on the way people think or (even better) on actual practice? And is it compatible with open science? [1] Write effective solutions to common problems. The code doesn't have to be pretty or ev




08/16/2012, 06:08

From feeds:

Open Access Tracking Project (OATP) » abernard102@gmail.com


oa.new oa.comment oa.green oa.open_science oa.google oa.peer_review oa.arxiv oa.impact oa.quality oa.presentations oa.social_media oa.twitter oa.floss oa.github oa.bioinformatics oa.preprints oa.definitions oa.blogs oa.repositories oa.versions



Date tagged:

08/20/2012, 18:27

Date published:

04/08/2012, 16:13