Pushing costs upstream and risks downstream: Making a journal publisher profitable

lkfitz's bookmarks 2017-09-09

I’m not quite sure exactly what was the reason but there was a recent flare-up of the good old “how much does it cost to publish a scholarly article” discussion recently. Partly driven by the Guardian article from last month on the history of 20th century scholarly publishing. But really this conversation just rumbles along with a regular flare up when someone does a new calculation that comes to a new number, or some assertion is made about “value” by some stakeholder.

Probably the most sensible thing said in the most recent flare up in my view was this from Euan Adie:

There isn't a "true cost" of publishing an article independent of context. More things with no true costs: bananas, newspapers, tweets

— Euan Adie (@Stew) July 7, 2017

There is a real problem with assigning a “true” cost. There are many reasons for this. One is the obvious, but usually most contentious, question of what to include. The second issue is the question of whether a single cost “per-article” even makes sense. I think it doesn’t because the costs of managing articles vary wildly even within a single journal so the mean is not a very useful guide. On top of this is the question of whether the marginal costs per article are actually the major component of rising costs. One upon a time that was probably true. I think its becoming less true with new platforms and systems. The costs of production, type-setting and format shifting for the web, should continue to fall with automation. The costs of running a platform tend to scale more with availability requirements and user-base than with the scale of content. And if you’re smart as a publisher you can (and should be) reducing the costs of managing peer review per article by building an internal infrastructure that automates as much as possible and manages the data resources as a whole. This shifts costs to the platform and away from each article, meaning that the benefits of scale are more fully realised. Dig beyond the surface outrage at Elsevier’s patent on waterfall peer review processes and you’ll see this kind of thinking at work.

Put those two things together and you reach an important realization. Driving per-article costs to platform costs makes good business sense, and that as you do this the long tail of articles that are expensive to handle will dominate the per-article costs. This leads to a further important insight. It will be an increasingly important part of a profitable journal’s strategy will be to prevent submission of high cost articles. Add back in the additional point that there is no agreement on a set of core services that publishers provide and you see another important part of the strategy. A good way to minimise both per-article and platform costs is to minimise the services provided, and seek to ensure those services handle the “right kind of articles” as efficiently as possible.

Pushing costs upstream

Author Selectivity

Nature, the journal has one of the best business models on the planet and, although figures are not publicly available, is wildly profitable. The real secret to its success is two-fold. Firstly, that a highly trained set of professional editors are efficient at spotting things that “don’t fit”. There is plenty of criticism as to whether they’re good at spotting articles that are “interesting” or “important” but there is no question that they are effective at selecting a subset of articles that are worth the publishing company investing in. The second contributor is author selectivity. Author’s do the work for the journal by selecting only their “best” stuff to send. Some people of course, think that all their work is better than everyone elses’s but they in turn are usually that arrogant because they have the kind of name that gets them past the triage process anyway. The Matthew effect works for citations as well.

It may seem counter intuitive but the early success of PLOS ONE was built on a similar kind of selectivity. PLOS ONE launched with a splash that meant many people were aware of it, but the idea of “refereeing for soundness” was untested, and perhaps more importantly not widely understood. And of course it was not indexed and had no Journal Impact Factor. The subset of authors submitting were therefore those who a) had funds to pay an APC b) could afford to take a punt on a new, unproven journal and c) had a paper that they were confident was “sound”. Systematically this group, choosing articles there were sure were “sound” would select good articles. When ONE was indexed it debuted with a quite high JIF (yes, I’m using a JIF, but for the purpose it is intended for, describing a journal) which immediately began to slide as the selectivity of submission decreased. By definition, the selectivity of the peer review process didn’t (or at least shouldn’t) change.

Anecdotally my own submissions follow exactly this path. I’ve never had a research paper sent to be refereed at Nature, every one was rejected without review. Each one of those articles went on to get a lot of citations – several are considered important in their field (although no-one ever really got the thermal equivalence thing, shame really). Similarly the first article my group sent to PLOS ONE (after rejection at several other journals) hasn’t done too badly either (as an aside, you can get to the articles via Scholar, I don’t want to artificially inflate altmetrics by linking directly). As an aside there’s an irony here. One of the ways to maximise the “quality” of articles a journal gets is to avoid being indexed, but have a profile amongs traditional academics in North America and Europe, and publish fast. That way you will systematically get submissions people think are good, but have been rejected by other journals. These are systematically likely to include important articles unrecognised by more conventional journals.

If you follow this logic through then having systems to rapidly reject papers that are likely to be problematic makes sense. Some journals have internal (but obviously secret) systems that reject everything from specific countries (one large publisher had an automatic rejection for everything from Iran at one point) or that fit specific patterns (GWAS studies that link single, or small numbers, of gene up or down regulation to a specific stimulus). Many editorial processes apply various forms of conscious or unconscious bias to “positive” effect here, rejecting articles from unknown institutions.

Technical measures

It isn’t just editorial processes that push costs upstream. It’s also possible to get authors to do extra work by technical means. The most obvious of these is journal submission systems. Once upon a time all the metadata for an article would have been extracted by the publisher or an indexer from the manuscript itself. Submissions involved some number of hard copies and a cover letter being posted. Today the many hours of filling in forms that are typical of submission systems places that work onto the authors’ shoulders.

Similarly the many, many efforts to get authors to use publisher-supported authoring tools seek the same goal. If successful, which they are generally not, they would push work out of the publisher workflow upstream to authors. One of the problems that has arisen with this is that authoring technologies keep moving faster than publishing ones. In some cases there are journals that successfully use the outputs of LaTex directly but these are the exceptions.

The best example remains that of the IUCr journals that essentially provide web based forms to upload standard data formats and small chunks of text into predefined sections. By obtaining structured data from the authors in a standardised format the publishing costs are reduced massively. This is a near ideal case where many reports that are structurally very similar can be supported by a platform. Doing this for articles in general is clearly much harder and there are no good examples of this that I’m aware of for articles in other fields (some fields have side-stepped this for particular data types but that’s different in subtle but important ways).

Obviously there are limits here. It’s not feasible to simply impose new technical requirements and expect all authors to simply acquiesce. And as noted previously in this quasi-series there are challenges relating to the control that submission system vendors hold over publisher workflows. Nonetheless this is an area where publisher costs can be substantially reduced at least in theory.

Pushing costs (and risks) downstream

If one solution to reducing costs is pushing the upstream, then another is clearly pushing them downstream. This can mean a range of things – costs to authors, funders, institutions, or “the system”. As these are often not obviously direct cash costs they can be diffuse. These are the costs of the publisher (or really of any actor in the publishing workflow) not doing something.

An extreme example of these is so-called “predatory” publishers that do not do peer review. To simplify the example we can put aside the question of whether publishers “add value” through peer review and treat this simply as a question of validation and certification. In a world where we assume that being published in “a journal” means an article has been peer reviewed, what (and where) are the costs of that not happening?

Eve and Priego (2017) have dissected the question of who is harmed in a recent article. In particular they note that the harm, largely in the form of risk, falls largely on those relying on “publication” in “a journal” as a quality indicator or a certification of peer review. There is an accumulating risk of being exploited by unscrupulous authors for systems and institutions that rely on such markers. The cost may materialise either as wastage when a poor decision is made (to hire someone or to fund a project) or more directly as the cost of replicating the validation process that was supposed to occur. Whatever you may have thought of Beall’s list, and I really didn’t think much of it, it was a product of donated labour. When it folded, many of the evaluation systems that had come to depend on it discovered those costs and had to find a way to cover them. What is worth noting is that, the more institutions rely on such markers (including eponymous lists) as reliable proxies, the greater the risk and the greater the cost when problems arise.

This is a toy example, albeit one that really did play out in practice. The risks also run the other way as Eve and Priego note, the loss of perfectly good work that is published in an otherwise unreliable venue. Similar kinds of cost and loss do play out across the publishing system where venue is relied upon as a proxy; at the extreme end the costs (financial and in terms of mortality and morbidity) of cases such as the Wakefield paper, assumed to be reliable because it was published in the lancet, and on the other of highly promising young researchers neglected (and the investment in their careers wasted) because they happen not to have the required paper in the right journals, or book with the right publisher.

There are many examples where publishers have to make a choice to internalise costs or push the risk downstream. Checking for image manipulation is a good example. This remains costly, and in most cases prohibitively so, to apply to every submitted image so spot checking is common, or restricting checks to accepted articles. But that passes the cost downstream, either to the next journal in the submission chain, or to the community of readers. Readers may even be unable to check images because of format shifting and resolution changes in the production process to reduce publisher costs. Unfortunately most of these decisions are done quietly without any publicity. It remains almost impossible to determine across journals exactly what checks are carried out, with what frequency, and at which stage.

What does this mean?

First and foremost, don’t take anyone who says “the cost of an article is [or should be] X” seriously unless they are very, very, clear on exactly what parts of the process of publishing they are talking about and the context in which those parts are taking place. Second, we need to get much more explicit about what services are being provided. There have been a range of efforts to try and improve on this, but they’ve not really taken off, at least in part because they’re not really in the interests of the big publishers, who benefit the most from playing the games I’ve described here. As I’ve argued elsewhere we need a much better understanding of how shifting the costs from the publisher balance sheet, to the academic worker, or institutional infrastructure, play out. How do cash costs of managing a strong quality assurance process within a publisher scale when they become the non-cash costs of an academic editor, or referee’s time? When they become the time wasted by researchers on following up on fatally flawed research?

All of the above relates to costs, and none of the above relates to price. We also need a better understanding of the political economics of pricing. Particularly we need to understand where we are creating a luxury goods market where price (either in work to get through the selection process or in money) drives the perception of brand, which reinforces the perceived value of the investment, and therefore leads to an increase in price. Be very wary of claims that publishing is a “value driven market”. This is both a way of avoiding a discussion of internal costs, which in some cases are outrageous, and of avoiding the discussion about precisely which services are being provided. It’s also a captive market. Another value-based market is plane flights out of Florida this week…

Above all, we need a much more sophisticated understanding and far better models of how costs are being distributed across the system in cash and non-cash costs, in labour and in capital. Overall we can point at the direction of travel, and the problems with existing pricing and cost structures. But its hard to make usable predictions or good design decisions without a lot more data. Getting that data out of the system as it exists at the moment will be hard, but the direction of travel is positive. There’s nothing wrong with publishers making a surplus, but any conversation about what kind of surplus is “appropriate” needs to based on a much deeper understanding of all the costs in the system.