Discerning Signal from Noise: Navigating the Flood of AI-Generated Prior Art

Patent – Patently-O 2024-04-30

by Dennis Crouch

This article explores the impact of Generative AI on prior art and potential revisions to patent examination standards to address the rising tidal wave of AI-generated, often speculative, disclosures that could undermine the patent system’s integrity.

The core task of patent examination is identifying quality prior art. References must be sufficiently accessible, clear, and enabling to serve as legitimate evidence of what was previously known. Although documents are widely available today via our vast network of digital communications, there is also increasing junk in the system — documents making unsubstantiated claims that are effectively science fiction. Patent offices prefer patent documents as prior art because they are drafted to meet the strict enablement standards and filed with sworn veracity statements. Issued patents take this a step further with their imprimatur of issuance via successful examination. Many of us learned a mantra that “a prior art reference is only good for what it discloses” — but in our expanding world of deep fakes, intentional and otherwise, is face value still worth much?

In a new request for comments (RFC), the USPTO has asked the public to weigh in on these issues — particularly focusing on the impact of generative artificial intelligence (GenAI) on prior art. Services like All Prior Art are using AI to churn out and ‘publish’ many millions of generated texts, hoping some will preempt future patent applications. See my 2014 post. These disclosures are often obscure, ambiguous and technically deficient and do nothing to promote the progress of the useful arts. Still, seemingly qualify as prior art under 35 U.S.C. 102, and are presumed to be enabling.

One key trick in prior art analysis is the presumption that the reference is operable and self-enabling. In other words, the USPTO and courts generally assume that a document which qualifies as prior art contains sufficient detail to enable a person skilled in the art to practice the subject matter disclosed. You might consider a technical paper from a conference which sketches out a conceptual gearbox design (but omits specific gear ratios and material specifications). Despite the technical gaps, the presumption of enablement means the patent examiner treats the document as if it teaches a skilled engineer how to make and use the described gearbox even though it does not provide sufficient guidance. This presumption facilitates the examination process by shifting the burden to the patent applicant to prove otherwise if they contest the prior art’s completeness or applicability. It also pushes applicants to ensure that their claims include limitations not found in the references — something that is becoming increasingly difficult as the scope of prior art disclosures continues to increase.

The presumption that prior art is enabled has developed incrementally. It began with a notion that issued patents had been examined and therefore the claimed subject matter was properly enabled; then expanded to included unclaimed material in issued patents. Amgen Inc. v. Hoechst Marion Roussel, Inc., 314 F.3d 1313, 1354 (Fed.Cir.2003). Later, the Federal Circuit extended the presumption to all publications. In re Antor Media Corp., 689 F.3d 1282, 1287 (Fed. Cir. 2012). Although the presumption can be overcome with “persuasive evidence” that the reference is not enabling, it continues to be difficult to prove a negative. Still, I would like to see some empirical evidence that helps me understand the severity of this issue.

In its Request for Comments, the USPTO poses a series of fifteen questions to stakeholders, including;

Whether AI-generated disclosures qualify as “prior art” under 35 U.S.C. § 102, and whether such treatment should depend on the degree of human involvement or curation.
How to handle the potentially enormous volume of AI-generated prior art, and its impact on patent examination.
Whether the presumption of enablement for prior art is warranted for AI disclosures.
How AI prior art affects the assessment of obviousness under 35 U.S.C. § 103 and the analysis of a “person having ordinary skill in the art.”
What new USPTO examination guidance or statutory changes may be needed.

In my mind, the fact that an AI generated the prior art is not a worry, so long as the work advanced the art. Rather, my real concern is that the outflow of publications – many of which are senseless – will gum up the patent system in nefarious ways.

1. Abundance of unread AI-generated data: Generative AI systems are generating vast amounts of synthetic data that is being “published” in the sense of being made publicly available, but much of this data may never be read or reviewed by humans. For example, I estimate that over the past 2 years AI systems have spit-out more text content than the total of all prior human-written publications. Although most of this content will never be human-read, AI-generated publications will be accessible to other AI systems – that can then incorporate the synthetic learning.

2. Fictional nature of AI-generated content: Many AI systems, especially large language models, are prone to generating content that is essentially “science fiction” – i.e., plausible-sounding but factually incorrect or divorced from reality. These typically include major gaps in reasoning or explanation that divorce the disclosure from human-shared reality. This tendency towards hallucination raises doubts in my mind about the reliability and enablement of AI-generated disclosures.

3. Motivation to Combine – an AI Strength: One of AI’s strengths is its ability to identify patterns and connections across diverse domains of information. This could potentially expand the universe of “analogous art” for purposes of obviousness analysis under 35 U.S.C. 103, as well as making it easier to find motivations to combine prior art references. At the same time, the connections made by AI may sometimes be spurious or non-sensical to human experts.

These issues make me think some about the enablement and written description requirement generally. It is improper for a patentee to claim a genus based upon a disclosure that includes a large number of inoperative species. That claim requires too much follow-on research work and so does not sufficiently disclose the invention. It seems we have a parallel situation here with AI prior art – both on the individual as well as the collective level. For the collective, AI are creating many worthless disclosures, but by creating billions of disclosures they are bound to hit upon the good ones as well. Still, those will likely be unrecognized for their worth until some later date when a human truly invents but is blocked from patenting. The difficulty lies in distinguishing between legitimate insights and spurious connections.

Ben Hattenbach & Joshua Glucoft have an interesting 2015 article on point. Patents in an Era of Infinite Monkeys and Artificial Intelligence, 19 Stan. Tech. L. Rev. 32, 42 (2015). The authors author some agreement with my analysis above — focusing on claim language they argued: “if a computer published millions of variations of claims such that all but a few were useless from a technical or grammatical perspective, then it would be easier to justify not requiring inventors to account for that sea of information.” On the other hand, “if a computer generated a focused set of high-quality variations on claim language, then it would be easier to justify folding such knowledge into the scope of the prior art.” The suggestion then is some sort of balancing test that focuses both on quality and accessibility. This is different from our current approach that is much more of an on-off switch. The authors also caution against automatically extending the presumption of enablement to all AI-generated disclosures. The presumption of enablement exists for traditional publications because we assume the authors intend to fully disclose a working invention. But with our current level of AI-generated content, especially that churned out at huge scale with minimal human curation, that assumption likely does not hold.

In his student note, Lucas Yordy focuses on some of the same issues – and argues that AI generated disclosures may decrease the patent incentive to research and disclose. The Library of Babel for Prior Art: Using Artificial Intelligence to Mass Produce Prior Art in Patent Law, 74 Vand. L. Rev. 521 (2021). But, Yordy notes the problem identified in the RFC, current patent law doctrines are ill-equipped to prevent AI-generated disclosures from rendering deserving inventions unpatentable. Like others, he calls out the enablement requirement as problematic, but he also goes on to propose a “conception” requirement for prior art to “ensure that AI-generated disclosures have actually contributed to public knowledge and have undergone some evaluation before they can render an invention unpatentable.”

In 2022, Lidiya Mishchenko published an article that went even further — arguing that even unexamined patent applications are causing prior art problems: they “occupy the patent idea space and can lead to examination [errors] and third-party search errors,” thus “contribut[ing] to costly unpredictability in the patent system more broadly by preventing others from getting a patent and by creating a temporary cloud of uncertainty.” Lidiya Mishchenko, Thank You for Not Publishing (Unexamined Patent Applications), 47 B.Y.U. L. Rev. 1563, 1564 (2022); See also, Michael McLaughlin, Computer-Generated Inventions, 101 J. Pat. & Trademark Off. Socy. 224, 239 (2019).

The USPTO is accepting comments in response to the RFC until September 29, 2024 via regulations.gov.

What are your thoughts, should Generative AI cause us to rethink prior art?