Discover this podcast and so much more

Podcasts are free to enjoy without a subscription. We also offer ebooks, audiobooks, and so much more for just $11.99/month.

#159 – Jan Leike on OpenAI's massive push to make superintelligence safe in 4 years or less

#159 – Jan Leike on OpenAI's massive push to make superintelligence safe in 4 years or less

From80,000 Hours Podcast


#159 – Jan Leike on OpenAI's massive push to make superintelligence safe in 4 years or less

From80,000 Hours Podcast

ratings:
Length:
171 minutes
Released:
Aug 7, 2023
Format:
Podcast episode

Description

In July, OpenAI announced a new team and project: Superalignment. The goal is to figure out how to make superintelligent AI systems aligned and safe to use within four years, and the lab is putting a massive 20% of its computational resources behind the effort.Today's guest, Jan Leike, is Head of Alignment at OpenAI and will be co-leading the project. As OpenAI puts it, "...the vast power of superintelligence could be very dangerous, and lead to the disempowerment of humanity or even human extinction. ... Currently, we don't have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue."Links to learn more, summary and full transcript.Given that OpenAI is in the business of developing superintelligent AI, it sees that as a scary problem that urgently has to be fixed. So it’s not just throwing compute at the problem -- it’s also hiring dozens of scientists and engineers to build out the Superalignment team.Plenty of people are pessimistic that this can be done at all, let alone in four years. But Jan is guardedly optimistic. As he explains: Honestly, it really feels like we have a real angle of attack on the problem that we can actually iterate on... and I think it's pretty likely going to work, actually. And that's really, really wild, and it's really exciting. It's like we have this hard problem that we've been talking about for years and years and years, and now we have a real shot at actually solving it. And that'd be so good if we did.Jan thinks that this work is actually the most scientifically interesting part of machine learning. Rather than just throwing more chips and more data at a training run, this work requires actually understanding how these models work and how they think. The answers are likely to be breakthroughs on the level of solving the mysteries of the human brain.The plan, in a nutshell, is to get AI to help us solve alignment. That might sound a bit crazy -- as one person described it, “like using one fire to put out another fire.”But Jan’s thinking is this: the core problem is that AI capabilities will keep getting better and the challenge of monitoring cutting-edge models will keep getting harder, while human intelligence stays more or less the same. To have any hope of ensuring safety, we need our ability to monitor, understand, and design ML models to advance at the same pace as the complexity of the models themselves. And there's an obvious way to do that: get AI to do most of the work, such that the sophistication of the AIs that need aligning, and the sophistication of the AIs doing the aligning, advance in lockstep.Jan doesn't want to produce machine learning models capable of doing ML research. But such models are coming, whether we like it or not. And at that point Jan wants to make sure we turn them towards useful alignment and safety work, as much or more than we use them to advance AI capabilities.Jan thinks it's so crazy it just might work. But some critics think it's simply crazy. They ask a wide range of difficult questions, including:
If you don't know how to solve alignment, how can you tell that your alignment assistant AIs are actually acting in your interest rather than working against you? Especially as they could just be pretending to care about what you care about.
How do you know that these technical problems can be solved at all, even in principle?
At the point that models are able to help with alignment, won't they also be so good at improving capabilities that we're in the middle of an explosion in what AI can do?
In today's interview host Rob Wiblin puts these doubts to Jan to hear how he responds to each, and they also cover:
OpenAI's current plans to achieve 'superalignment' and the reasoning behind them
Why alignment work is the most fundamental and scientifically interesting research in ML
The kinds of people he’s excited to hire to join his team and maybe save the world
What most readers misunderstood about the OpenAI
Released:
Aug 7, 2023
Format:
Podcast episode

Titles in the series (100)

Unusually in-depth conversations about the world's most pressing problems and what you can do to solve them. Subscribe by searching for '80,000 Hours' wherever you get podcasts. Produced by Keiran Harris. Hosted by Rob Wiblin, Head of Research at 80,000 Hours.