I started working at AI Impacts slightly less than a year ago. Before then, I was not following developments in either AI or AI safety. I do not consider myself a rationalist and did not engage with LessWrong before starting this job. While I have mostly been working on historical case studies,1 I have gotten a close look at the AI safety community and the arguments therein. I live in a rationalist group house and work out of an AI safety office. I think I have about as informed an opinion on AI safety as is possible without doing a bunch of technical alignment research or being involved in the community for years.
Here are my current opinions on AI safety. Some of them may be wrong: I endorse being wrong more often if the alternative is not saying things of consequence.
This is presented as an organized list of my thoughts. There are arguments in my head justifying most of these, but I will not be spelling them out in detail here. I will link to more detailed arguments when they are available. If something is in italics, then I wrote the argument at that link, or intend to write about it in the future.
This should be readable at any level of the list. If you want a quick overview, you can just read the top level points, in bold. Or you can read some details, but not others. Or you can read everything.
I am mostly unconvinced by the classic story of AI risk.
Currently, AI is not very significant in the global economy / human society. In order to become impactful, the capabilities of AI will have to increase.
AI capabilities are increasing rapidly now.
It is not clear how much AI capabilities will need to increase in order for AI to become very impactful.
Several ways of operationalizing extremely capable AI include whole brain emulation (WBE), artificial general intelligence (AGI), and transformative artificial intelligence (TAI).
Whole brain emulation seems impossible on a (classical) computer.
Neurons / synapses are not logic gates. They have complicated internal dynamics.
Computer engineers put a lot of effort into making sure that individual bits are either 0 or 1 and are independent of any microscopic randomness. Brains do not have this. There is the possibility of chaotic - and also non-chaotic - behavior at any scale. There is no non-chaotic mesoscale preventing unpredictable and uncontrollable microscopic fluctuations from influencing relevant macroscopic behavior.
There are some examples of inherently quantum effects impacting the macroscopic behavior of some biological systems. These are sometimes not things where the effects can be fully captured by setting some macroscopic parameters.
I am sympathetic to quantum theories of the mind. I don’t think any one that currently exists is satisfying or has solid evidence, but I’m glad that there is work in this direction.
We have not gotten whole brain emulation to work for C. elegans, even though it has only 302 neurons.
It feels like there is some assumption that information theory solves the hard problem of consciousness, or maybe something more vague about what it means to be a person.
The thinking here often feels implicit or unclear, so I’m not sure exactly how to operationalize the disagreement, but it feels like something is very wrong here.
AGI seems more plausible than whole brain emulation, but still extraordinarily difficult or impossible.
I’m using the Annie Oakley definition of AGI: Anything You Can Do I Can Do Better.
Being able to exactly model a human would be sufficient for AGI, but I do not think that a complicated indeterministic system (like humans) can be well modeled by a complicated deterministic system plus a simple random number generator.
AGI is more difficult than being superhuman at every well-specified task because humans can do, and create, things which are not well-specified tasks.
Here are some things that humans do that I think are particularly hard for AI:
Choosing your own end goals.
It’s not clear that we would want powerful AI to have this ability.
A particular notion of creativity that involves increasing the dimension of the space of possible options, as opposed to recombining existing options.2
Human abilities do not just include a list of effectiveness at some particular skills - they also include the ability to create new skills.
Responding well to completely novel situations.
In particular, recursive self-improvement feels pretty unlikely, especially if people don’t actively try to make something that recursively self-improves.
TAI seems much more plausible than AGI, but still very difficult.
By ‘TAI’, I mean ‘AI that could pose an existential threat’.
Killing all humans seems very hard, but easier than replicating everything humans can or might do.
I would guess that the most important threats involve interactions with other potential x-risks, like hacking nuclear arsenals or engineering viruses.
I think that the threat from highly agentic systems is overrated, the threat from misuse is slightly underrated, and the threat from accidents should be highest rated.3
Aligning powerful AI does seem to be pretty hard, but not completely hopeless.
I don’t have any particular expertise here, so I’m deferring to what seems to be a moderate position in the AI safety community:
We don’t know how to align current systems.
We don’t know how to tell if a system is aligned.
We don’t know if alignment techniques will scale better or worse than capabilities.
Strong optimization seems generically dangerous and hard to predict or control.
But maybe we don’t need to get exact human values. Close is good enough.
Being able to iterate with near-human-level AI systems would be valuable, and seems likely for any particular skill (see below).
There are multiple lines of research which people are following that might bear fruit. As more people work on AI alignment, new strategies will be developed.
This feels like it depends on the relative rate of progress in alignment and in capabilities.
Partially, this will be due to technical details of how hard the problems are.
This is also something we can influence, by getting more people to do alignment research or by slowing progress in AI capabilities.
Getting alignment right is less important if AI ends up being less powerful.
The time to cross the human range seems likely to be large for any particular skill.
The range of human ability seems like it spans at least an order of magnitude for many important skills.
This is probably measurably true for some tasks: vocabulary (or number of languages) known, speed of doing arithmetic, writing speed, sales commissions, speed of learning to play a song.
This feels even more true for some immeasurable things: groundbreaking scientific research, producing great art/music/literature, impacting people’s lives.
Discontinuities in technological progress exist, but they are uncommon.
Especially recent, large discontinuities.
For the tasks for which computers are currently superhuman, it took decades to cross the human range.
Technological progress stops sometimes.
It shouldn’t be too surprising if AI capabilities development abruptly hits a wall and stops progressing for a few decades.
One particular way this might occur: When can training something to mimic human behavior allow it to become superhuman at that behavior?
Other possibilities include the end of Moore’s Law and running out of human-generated data to train on.
Also unknown unknowns.
It is almost guaranteed that we will encounter problems we aren’t currently considering. What is unclear is how insurmountable these future problems will be.
When you're trying to do something hard, the effect of unknown unknowns is not symmetric. You're more likely to encounter an unknown unknown that hinders you than helps you.
AI is probably less impactful than people expect.
We could try to measure how influential a particular industry is on the rest of the economy/society.
If a new technology from a particular industry increases that industry’s efficiency by some percent, how much does that cause overall productivity to increase or people’s quality of life to improve?
The answer could be thought of as a multiplier.
We could rank industries by this multiplier.
My guess is that energy has the largest multiplier.
It seems like, so far, information technology has an extremely low multiplier.
Even extremely widespread adoption of a new technology in this industry does not change productivity or quality of life that much.
Instead, it shows up everywhere but the productivity statistics.
This would suggest that people tend to significantly overestimate the importance of advances in IT generally.
Maybe AI will be different, but this should lower our expectations of its impact.
Governance doesn’t feel that hard.
Creating really good regulations, which facilitate all good research and prevent all dangerous research, is hard because no one knows how to distinguish good from dangerous research. If the goal is to slow or stop AI development, that is much more tractable.
Many scary-seeming technologies are heavily regulated.
I don’t quite want to say that this is the default option, but it should not be at all surprising.
It feels like more industries are over-regulated than under-regulated.
Software has an unusually small amount of regulation.
Advocacy was often necessary to create the regulation, but the advocacy wasn’t that hard and it was often successful
Public surveys suggest that many/most people are open to more regulation of AI.
Countries copy each other’s policies a lot, even when there’s little international pressure to do so.
A lot of mimicry happens with little or no effort from major actors.
Examples include medical ethics and geoengineering.
Also anecdotal evidence from people who have worked in government in small countries.
Coordinating the whole world is probably not much harder than coordinating 2-3 countries.
We already have laws that prevent people from building some weapons systems or related technologies.
Enforcement mechanisms are feasible and not dystopian.
Leading AI systems currently require a lot of computing power. If this changes, and dangerous AI can be built on a laptop, then enforcement becomes harder.
Regulation can shift techno-hype towards other technologies, which makes future regulation easier.
My p(Doom) is in the low single digits. Maybe 3%? This is a gut feeling probability, rather than a Fermi estimate.
The view from within these arguments feels even lower. I’m discounting some because of uncertainty whether these arguments are valid.
The central vision of the future I expect is that AI continues to do some impressive things and becomes more widespread, but it does not have much material impact on most people’s lives. I think that AI progress completely stopping and AI progress actually affecting the productivity statistics as similarly plausible.
I nevertheless support slowing AI development.
Even 1% is an extremely high chance of death.4
If you think that there is a 1% chance of AI killing everyone, that means that, in expectation, AI kills 80 million people.
This is about as bad as Stalin & Mao combined.
COVID-19 also had about a 1% risk of death. We were willing to temporarily shut down large sections of the global economy to respond to this threat.
Pausing AI development for 6 months would be like a COVID shutdown, except only directed at one small corner of one industry.
This is a much smaller cost to respond to a similar scale, although differently distributed, risk.
In expectation is probably not the best way of thinking about this.
We don’t live in the in expectation world.
Humanity can definitely recover from losing 1% of the population. Existential risk is worse than a similar scale risk for each person independently.
You don’t bet the farm on an uncertain prospect. High variance, high expectation bets are worse than low variance, moderate expectation bets if survival is at stake.
Other technologies which have significant x-risk should not be pursued, or only developed extremely cautiously.
Most technologies’ x-risk is many orders of magnitude lower than 1%. Things with comparable x-risk are considered to be really dangerous.
Toby Ord estimates that nuclear weapons have an x-risk of about 1/1000.
I support nuclear non-proliferation, reducing the size of nuclear stockpiles, and strong norms against the use of nuclear weapons.
Toby Ord estimates that climate change has an x-risk of about 1/1000.
I support transitioning to conventional nuclear power, using hydro, wind, & solar where they’re most effective, and developing new power sources like fusion & deep geothermal.
Toby Ord estimates that the x-risk from new diseases is about 1/30.
I support bans on biological weapons, stopping gain of function research, faster vaccine research & approval mechanisms, better pandemic detection & monitoring systems, and maintaining latent capacity in our health care & medical manufacturing industries.
Part of my motivation is due to non-existential problems these threats pose, but x-risk itself is also really bad.
AI should be treated similarly to other technologies with a 0.1% - 10% x-risk.
This means strict regulation, slow & cautious development, and considering outright bans.
The reasons I think AI x-risk is unlikely also argue against Our Glorious Future coming from AGI, so I expect that there is less to be gained by not slowing AI.
Some people claim that the benefits of AGI are so large that they outweigh the x-risk.
This is not how you should think about risk when survival is on the line.
Don’t bet the farm.
Some people are more worried about their personal death than the survival of humanity.5
This is pure selfishness.
This may feel persuasive to you, but no one else should be persuaded by your selfishness.
If whole brain emulation is harder than TAI, then we would face x-risk before we get digital immortality via mind uploading.
Mind uploading might not even be a thing because of the hard problem of consciousness.
There’s a lot of other exciting technologies people could be working towards instead.
Working on AI might not be the best way to advance technology in the ways you care about.
It might be better to work on the problem you care about directly than to try to route through a potentially fully general problem solving tool that has significant x-risk.
If you want to advance public health, do you work on mRNA vaccines or on AI?
If you want to colonize space, do you work on rockets or on AI?
If you want better sources of energy, do you work on batteries or on AI?
AI might not end up being a feasible way to solve these problems.
There is opportunity cost when people decide to work on AI.
Elon Musk has a space colonization company. Sam Altman has a fusion company. They could be advancing technology in more concrete ways instead of trying to create something that poses an existential threat.
CEOs are the most famous, but this is also true when people reskill in order to work on AI or when students decide that they want their career to be in AI.
More people should be thinking of atoms, not bits, in order to improve the future.
Regulation now can steer the hype train.
Technological development is path dependent, so shifting focus can have long-term consequences.
When some technologies are favored and some disfavored by regulation, it signals what the next big technology will or won’t be and encourages or discourages research in that direction.
For example, compare the trends in cost of nuclear power vs. renewables.
The time of perils doesn’t have to be ended by AI.
Space colonization would dramatically reduce x-risk.
It’s not clear to me that ‘time of perils’ is even a good way to understand our technological past, present, and future.
Even if this is a good framing, it’s not clear whether we should try to end the time of perils as quickly as possible by building AGI now, or whether it’s better to make this time less perilous by being more cautious.
If most of the risk is not from AI, and we were convinced AGI would solve the problems, then building AGI quickly would make sense.
That doesn’t seem to be the world we live in.
Regulation designed to slow AI might not even slow technological progress overall.
Many people currently working on AI would work on something else exciting.
Progress in IT has been less impactful on overall productivity and quality of life than progress in other fields.
But there are fixed costs as people transition and some people’s jobs will be less well matched to their skills.
The market should sort this out, unless people systematically overestimate the value of AI, or IT more generally. I think that they do.
Even if there is a net cost to technological progress, it could be worth it to reduce x-risk.
I don’t trust leading AI companies to be responsible.
Explicit statements that they’re racing.
Completely dismissing AI safety concerns.
Although others seem to be taking AI safety seriously.
I have a really high standard for ‘responsible’ when potential x-risks are involved.
Other people and groups in the AI community are much less responsible.
Purposefully building dangerous versions of things to get people scared.
Examples: very unaligned, agentic, weapons design.
Presumably there are plans to stop doing this before things are actually dangerous. Do they know when this is?
Even if I think something is probably impossible, that doesn’t mean I want someone to be actively trying to build it.
I don’t know of any other industry where this is a thing.
There are tests to failure, like crash tests, and red teaming to find existing flaws, but not redesigns to make things more dangerous.
Tullock’s spike is a thought experiment, not something car companies make.
While leading models are not open source, it does not seem to take very long for similar capabilities to become open source.
Leading labs sometimes provide resources for other people to make open source models.
I don’t know how hardened AI labs are against leaks.
I am sympathetic to “information wants to be free” arguments for most things, but not if it’s a potential x-risk.
We shouldn’t want nuclear weapons designs or virus engineering to be open source either.
There exist people who build dangerous things for the LOLs.
Many leading AI companies have close relationships with the AI safety community. But it also feels like they have close relationships with communities pulling them in less responsible directions.
Even if we figure out how to build AGI, solve the technical alignment problem, and leading AI labs follow the AI safety community, I’m not convinced that things will end well.
This isn’t the future I expect, but many people reading this probably hope for this future, so it is worth addressing.
I don’t think that the AI safety community is hardened against takeover.
If the AI safety community did gain significant power, various ideological groups or selfish individuals would try to gain influence over it.
It is common for political revolutions to be hijacked by a populist dictator who outmaneuvers the original leaders.
This is not a great analogy, but it is concerning.
It feels like someone who learns the shibboleths and offers money or influence could help make strategic decisions in the community.
Sam Bankman-Fried. Hopefully this has improved since the FTX collapse.
It feels like many people in the AI safety community have a limited understanding of evil.
This is great if it means that many people haven’t had to interact with terrible people.
But it is a potential source of naivety / vulnerability.
There isn’t a vision of Our Glorious Future with widespread support among the AI safety community.
The AI safety community attracts futurists of many types. The community feels simultaneously highly non-representative of society and not unified in a vision of the future.
Some people claim that superintelligence aligned to anyone’s values is good enough.
It feels like this attitude underestimates how terrible some ideologies that people endorse are.
This seems better than everyone dying from unaligned TAI, but not than continued technological progress without TAI.
The utopian dreams of many leaders of the community are often unspecified (or unconsidered?).
If they were laid out, they would likely see significant debate within, and from outside, the community.
This is a potential source of conflict in the future.
Some important unresolved questions:
Are we trying to protect humanity or usher in transhumanism?
How seriously should we take the potential rights of AI systems themselves?
How should we weigh the needs of people who currently exist, compared to the needs of potentially large numbers of people who might exist in the future?
Is there continuity of self? How does this impact existing people’s rights?
How much democratic control should there be? Or is it preferable for the most ‘rational’ people to have more control?
Should we be willing to seize significant power over the world to prevent current or future x-risks? What pivotal acts are or are not acceptable to take?
Is value lock-in the goal of alignment or something bad?
What costs should we be willing to pay to protect, or avoid harming, people who don’t want to take part in Our Glorious Future?
How much x-risk is worth tolerating to achieve Our Glorious Future? How much suffering, or restrictions on freedom, or other more mundane costs are worth tolerating to achieve Our Glorious Future?
Even if we don’t want the AI safety community to be the ones answering these questions, I would like the community to make their preferred answers more public.
It is useful to start working on problems before you have to solve them.
Having well-specified hopes gives us a standard to judge community leaders by in the future and makes takeover harder.
In order for the public to meaningfully engage with the AI safety community, it helps them to know the community’s goals.
Not telling the public what we hope to achieve with powerful AI gives them less ability to contribute to answering these questions.
If there were a vote or something where the community determined its answers to these and other questions, I would not be surprised if I strongly disagreed with the results.
This is not because I know that I would answer them correctly. If future AI systems are anywhere near as powerful as many people in the AI Safety community think they would be, then I do not trust anyone with that much power.
I don’t think that AI will be that powerful, but that does not mean that we should be trying to build it.
I think that AI doom is unlikely (but not extremely unlikely), and that we should be trying to slow or stop AI development. Most people who have higher p(Doom) than me seem wrong about the world, particularly the nature of intelligence, agency, or what it means to be human. Most people who are less willing to slow AI than me seem wrong about how to act, particularly when dealing with an x-risk. Many people in the AI safety community both have higher p(Doom) and are less willing to slow AI than me. This seems bad.
My guess is that this is mostly selection bias. The consensus view on LessWrong had been6 that AI is very dangerous, but we shouldn’t try to slow or stop it. This has attracted people to the AI safety community who believe this. The taboo on advocating for slowing or stopping AI has only been broken within the last year or two. Already, the community has shifted significantly in this direction.
My guess is that as more people become familiar with AI safety, they’ll mostly end up close to my position:7 skeptical, but concerned. This should make governance easier: people do not have to be completely sold on the whole AI x-risk argument to be willing to regulate, slow, or even stop the development of potentially dangerous AI development.
Which is why AI Impacts hired me.
I don’t intend to engage much about this point until I’ve written more on this.
It’s not clear to me whether this is underrated or appropriately rated. There’s a lot of work being done on particular problems, most of which I am not familiar with.
I said 3% before, but I don’t think that the rest of the argument changes much as long as it is low single digits. 1% is more memorable.
This is partially based on personal conversations I’ve had and partially based on other people who have been surprised by how common this position is.
As of 2021.
This is guaranteed by the typical mind fallacy.
> A particular notion of creativity that involves increasing the dimension of the space of possible options, as opposed to recombining existing options.
To me, the fact that AlphaZero has come up with new go openings seems to basically refute this. You can justify it as "well it's recombining existing options of moves on the go board" but that seems as uselessly reductionistic as describing any human creativity as recombining existing options to contract various muscles.
> There are some examples of inherently quantum effects impacting the macroscopic behavior of some biological systems. These are sometimes not things where the effects can be fully captured by setting some macroscopic parameters.
Aren't classical computers capable of simulating quantum systems?