Paul Christiano on the safety of future AI systems

Sep 11, 2019

By Asya Bergal, 11 September 2019

Paul Christiano

As part of our AI optimism project, we talked to Paul Christiano about why he is relatively hopeful about the arrival of advanced AI going well. Paul Christiano works on AI alignment on the safety team at OpenAI. He is also a research associate at FHI, a board member at Ought, and a recent graduate of the theory group at UC Berkeley.

Paul gave us a number of key disagreements he has with researchers at the Machine Intelligence Research Institute (MIRI), including:

Paul thinks there isn’t good evidence now to justify a confident pessimistic position on AI, so the fact that people aren’t worried doesn’t mean they won’t be worried when we’re closer to human-level intelligence.
Paul thinks that many algorithmic or theoretical problems are either solvable within 100 years or provably impossible.
Paul thinks not having solved many AI safety problems yet shouldn’t give us much evidence about their difficulty.
Paul’s criterion for alignment success isn’t ensuring that all future intelligences are aligned, it’s leaving the next generation of intelligences in at least as good of a place as humans were when building them.
Paul doesn’t think that the evolution analogy suggests that we are doomed in our attempts to align smarter AIs; e.g. if we tried, it seems likely that we could breed animals that are slightly smarter than humans and are also friendly and docile.
Paul cares about trying to build aligned AIs that are competitive with unaligned AIs, whereas MIRI is going for a less ambitious goal of building a narrow aligned AI without destroying the world.
Unlike MIRI, Paul is relatively optimistic about verification and interpretability for the inner alignment of AI systems.

Paul also talked about evidence that might change his views; in particular, observing whether future AIs become deceptive and how they respond to simple oversight. He also described how research into the AI alignment approach he’s working on now, iterated distillation and amplification (IDA), is likely to be useful even if human-level intelligences don’t look like the AI systems we have today.

A full transcript of our conversation, lightly edited for concision and clarity, can be found here.

AI Impacts blog

Discussion about this post

Ready for more?