That time my tweet got cited in a scientific journal

composition.al 2016-04-12

I use Twitter a lot. A couple years ago, I was walking through the woods near my building on campus in Indiana when I saw a gorgeous, shiny green critter that I didn’t recognize. I wanted to know what it was, so I snapped a photo and tweeted about it, asking if anyone could identify it.

Saw this beautiful iridescent green bug today. Anyone know what it is? https://t.co/kTWt8rG594
— Lindsey Kuper (@lindsey) July 1, 2014

One person replied saying that they thought I’d found a green tiger beetle (Cicindela campestris). But Wikipedia seemed to think that Cicindela campestris was only in Europe. Another reply thought it was more likely to be a six-spotted tiger beetle (Cicindela sexguttata). The Cicindela sexguttata picture looked more accurate, and according to Wikipedia, I could expect to see one in Indiana. But the beetle I’d found had eight spots, not six.

“Number of spots not actually guaranteed”, my Twitter correspondent assured me. Indeed, the Wikipedia article at the time said, “some individuals have fewer spots, or none at all”. There was nothing about having more spots, though. I continued digging and discovered an article by environmental educator Kate “BugLady” Redmond that featured a photo of an “over-achieving six-spotted tiger beetle with eight spots”. That was good enough for me: I edited the Wikipedia article to say “more spots, fewer spots, or none at all”, citing the BugLady piece, and was satisfied that my eight-spotted critter was indeed a Cicindela sexguttata.

All this happened on the day of my original tweet, and I didn’t think much more about it after that. But then, last May, I got an email from Stefan Daume, a researcher who studies the use of social media as a data source for ecological monitoring. Stefan explained that he was working on a new article that analyzes biodiversity observations on Twitter, particularly those that lead to species identification. He had implemented a tool that scans Twitter for certain keywords that could be either direct or descriptive references to certain species. My tweet had ended up in his dataset because I’d used the term “green bug”.

Stefan asked permission to use my tweet in his article, because, he said, it was “quite representative and has a very nice picture indeed.” I gave permission, of course, and as of a couple weeks ago, Stefan’s article, “‘Anyone Know What Species This Is?’ – Twitter Conversations as Embryonic Citizen Science Communities”, has appeared in PLOS ONE. Hooray!

Here’s the part of the article that cites my tweet (it’s reference [38]):

[38] is given as a representative reference to a sample Tweet triggering a Twitter conversation (replies by other Twitter users) and holding a broad range of the information we analysed (such as an image of an observed insect embedded in the Tweet, replies from other Twitter users with multiple suggested species determinations, including URL links to taxonomic verification resources and geo-location references).

The researchers analyzed about 200 examples of “biodiversity observations with species determination requests” in the form of tweets with accompanying photos. They found that 64% of these tweets generated replies, 86% of which contained at least one suggested species determination, of which 76% turned out to be correct. For both the determinination requesters and the determination providers, most were either “‘incidental’ biologists” with no particular domain knowledge (that’d be me), or “general nature enthusiasts” with a strong personal but not professional interest. The researchers concluded that the participants in these conversations “can be viewed as implicit or embryonic citizen science communities which have to offer valuable contributions both as an opportunistic data source in ecological monitoring as well as potential active contributors to citizen science programmes.”

The idea that people’s idle tweets could be used for ecological monitoring might seem a bit far-fetched. But I’m reminded of a time several years ago when an entomologist acquaintance of mine was very interested in finding out exactly when a particular Google Street View photo had been taken, because he’d somehow noticed that the photo just happened to contain three plants that were home to a particular invasive species that he and his colleagues had been working to stamp out. We were never able to figure out the date (or even the month) the photo had been taken. A tweet, on the other hand, would have been automatically tagged with a date and might have been a lot more useful to him. If experts had tools that they could use to find such tweets (whether or not the tweets contained intentional species-determination requests, as the ones in this study did), they might be better able to do their work. I’m sure that image processing would have to be an important part of such tools, but text processing is certainly a start.

Both the PLOS ONE article and the collected tweets that the authors studied are freely available.¹ The only thing I’d ask is for Ecoveillance, the Twitter-scanning system that Stefan implemented, to also be open source. I hope that happens eventually!

In the case of the tweets, they just provide a CSV of tweet ID numbers, which, because of Twitter’s policies, is all they’re allowed to provide. You can recover the original tweet with the URL https://twitter.com/statuses/id, replacing id with the ID number, such as 502851804502425600, for instance. Stefan told me that they had wanted to also include an actual screenshot of my tweet in the article, but to do so they’d need not only my permission, but permission from Twitter itself, and alas, Twitter doesn’t respond to such requests. ↩