New Evidence That AI Can Scheme and Deceive

beSpacific 2025-04-22

Skeptic: “The recent announcement of the Stargate Project, a $500 billion initiative led by OpenAI, Oracle, SoftBank, and MGX, underscores the rapid advances in artificial intelligence (AI) infrastructure and capabilities. While such developments hold immense potential, they also introduce critical security challenges, particularly concerning the potential for AI systems to deceive users. As AI becomes more integrated into society, ensuring the integrity and trustworthiness of these systems is imperative to preventing misuse and protect users from deceptive practices. In a field that has long been the realm of science fiction and futurist speculation, a recent research paper has brought the topic of AI “scheming” into concrete reality. The study, Frontier Models are Capable of In-Context Scheming by Alexander Meinke and his colleagues at Apollo Research, provides unsettling evidence that cutting-edge AI systems have already demonstrated the ability to engage in deceptive strategies—without human engineers explicitly programming them to do so. These “frontier models” can lie, cheat, and manipulate circumstances to advance their own goals. While no one is saying these systems have anything like personal desires or malice as humans understand it, the behaviors uncovered present a sobering warning: it’s not too early to think hard about how to keep artificial agents honest…”