Openness on OpenAI

Shtetl-Optimized 2024-05-21

I am, of course, sad that Jan Leike and Ilya Sutskever, the two central people who recruited me to OpenAI and then served as my “bosses” there—two people for whom I developed tremendous admiration—have both now resigned from the company. Ilya’s resignation followed the board drama six months ago, but Jan’s resignation last week came as a shock to me and others. The Superalignment team, which Jan and Ilya led and which I was part of, is being split up and merged into other teams at OpenAI.

See here for Ilya’s parting statement, and here for Jan’s. See here for Zvi Mowshowitz’s perspective and summary of reporting on these events. For additional takes, see pretty much the entire rest of the nerd Internet.

As for me? My two-year leave at OpenAI was scheduled to end this summer anyway. It seems pretty clear that I ought to spend my remaining months at OpenAI simply doing my best for AI safety—for example, by shepherding watermarking toward deployment. After a long delay, I’m gratified that interest in watermarking has spiked recently, not only within OpenAI and other companies but among legislative bodies in the US and Europe.

And afterwards? I’ll certainly continue thinking about how AI is changing the world and how (if at all) we can steer its development to avoid catastrophes, because how could I not think about that? I spent 15 years mostly avoiding the subject, and that now seems like a huge mistake, and probably like enough of that mistake for one lifetime.

So I’ll continue looking for juicy open problems in complexity theory that are motivated by interpretability, or scalable oversight, or dangerous capability evaluations, or other aspects of AI safety—I’ve already identified a few such problems! And without giving up on quantum computing (because how could I?), I expect to reorient at least some of my academic work toward problems at the interface of theoretical computer science and AI safety, and to recruit students who want to work on those problems, and to apply for grants about them. And I’ll presumably continue giving talks about this stuff, and doing podcasts and panels and so on—anyway, as long as people keep asking me to!

And I’ll be open to future sabbaticals or consulting arrangements with AI organizations, like the one I’ve done at OpenAI. But I expect that my main identity will always be as an academic. Certainly I never want to be in a position where I have to speak for an organization rather than myself, or censor what I can say in public about the central problems I’m working on, or sign a nondisparagement agreement or anything of the kind.

I can tell you this: in two years at OpenAI, hanging out at the office and meeting the leadership and rank-and-file engineers, I never once found a smoke-filled room where they laugh at all the rubes who take the talk about “safety” and “alignment” seriously. While my interactions were admittedly skewed toward safetyists, the OpenAI folks I met were invariably smart and earnest and dead serious about the mission of getting AI right for humankind.

It’s more than fair for outsiders to ask whether that’s enough, whether even good intentions can survive bad incentives. It’s likewise fair of them to ask: what fraction of compute and other resources ought to be set aside for alignment research? What exactly should OpenAI do on alignment going forward? What should governments force them and other AI companies to do? What should employees and ex-employees be allowed, or encouraged, to share publicly?

I don’t know the answers to these questions, but if you do, feel free to tell me in the comments!