“Deep fakes threaten truth everywhere, but particularly in science. Content provenance means data provenance, information provenance and knowledge provenance. It is the chain of provenance that will establish what is truth.”
Statistical Modeling, Causal Inference, and Social Science 2025-02-07
Mark Tuttle shares this quote from Leslie Lenert:
Deep fakes threaten truth everywhere, but particularly in science. Content provenance means data provenance, information provenance and knowledge provenance. It is the chain of provenance that will establish what is truth.
I don’t know about that, but I thought I’d share it with you. It reminds me a bit of our discussion of how plagiarism is often used to mask misunderstanding.
Remember when the prominent statistics professor Ed Wegman copied a wikipedia article for a paper that he claimed he was writing–and, in doing the copying, he introduced errors? I don’t think he ever apologized for that! Nowadays, I guess he wouldn’t even bother with wikipedia–he’d just copy some chatbot output that was in part trained on wikipedia.
Of course, if all Weggy’s gonna do is copy over some chatbot output, then the journal editors could cut out the middleman and run the chatbot and print its output themselves–or the readers of the journal could cut out the middleman and save $1400-$2800 each by just running the chatbot themselves.
The bad thing about the article being produced as it was is that (a) at best it adds nothing to the literature, being a repeat of something that’s much more easily accessible on wikipedia, (b) it introduced an error, so it was actually worse than Wikipedia, (c) it appeared in an authoritative-looking source, which could mislead the few people who read it–I just looked up the article on google scholar and it has 6 citations, which isn’t a lot but it ain’t zero either–I’ve published lots and lots of articles that have fewer than 6 citations!–, (d) it rewarded and thus continued to incentivize scholarly misconduct, and (e) various people around the world were paying $1400-$2800 for this crap.
The only reason anyone would prefer this article instead of the wikipedia version is that it had the imprimatur of a prominent statistics professor. But that’s a bad thing, not a good thing, given that the article contained errors. Also, the article didn’t credit the wikipedia article that it was ripping off, so there was no easy way for readers to track down the good stuff that was diluted and adulterated to prepare this particular brew.
It was the same story when Chrissy Hesse, a prominent math professor (I heard a rumor that, several decades ago, he was the third-youngest math professor in Germany!) copied chess stories without clear attribution for his book, again introducing errors while covering his tracks in a way that had the effect of making it harder for readers to learn the truth. That was just chess, and Wegman was trying to derail climate change research, but it’s the same general idea.
Prospects for the future
Recall Lenert’s above-quoted remark. Going forward, there should be less demand for noisy aggregators such as Wegman and Hesse, because the job of “noisy aggregation of material already on the internet” is already done well by chatbots. On the other hand, there could be more demand for credentialed authorities such as Wegman and Hesse to give their stamp of approval for such summaries. At best, this could be done by the authority figures running the chatbot themselves and then carefully reading and editing the output before endorsing it. At worst, the authority figures could run the chatbot–or maybe outsource that menial-seeming job to a student–and not read it at all before endorsing it. Also, no need for them to ever admit it came from a chatbot, in the same way that they did not earlier admit when they ripped off particular sources.
If you can fake the text, you can fake the chain of provenance too. And a demand for authority figures would seem to create its own perverse incentives. So I’m concerned.