# Probability of unfriendly and friendly AI

A quick breakdown of my probability estimates of an extinction risk due to artificial general intelligence (short: unfriendly AI), the possibility that such an outcome might be adverted by the creation of a friendly AI, and that the Machine Intelligence Research Institute (short: MIRI) will play an important technical role in this.

Probability of an extinction by artificial general intelligence: 5 × 10^-10

1% that an an information-theoretically simple artificial general intelligence is feasible (where “simple” means that it has less than 0.1% of the complexity of an emulation of the human brain), as opposed to a very complex “Kludge AI” that is being discovered piece by piece (or evolved) over a long period of time (where “long period of time” means more than 150 years).

0.1%conditional on the above, that such an AI cannot or will not be technically confined, and that it will by default exhibit all basic AI drives in an unbounded manner (that friendly AI is required to make an AI sufficiently safe in order for it to not want to wipe out humanity).

1%, conditional on the above, that an intelligence explosion is possible (that it takes less than 2 decades after the invention of an AI (that is roughly as good as humans (or better, perhaps unevenly) at mathematics, programming, engineering and science) for it to self-modify (possibly with human support) to decisively outsmart humans at the achievement of complex goals in complex environments).

5%conditional on the above, that such an intelligence explosion is unstoppable (e.g. by switching the AI off (e.g. by nuking it)), and that it will result in human extinction (e.g. because the AI perceives humans to be a risk, or to be a resource).

10%conditional on the above, that humanity will not be first wiped out by something other than an unfriendly AI (e.g. molecular nanotechnology being invented with the help of a narrow AI).

Probability of a positive technical contribution to friendly AI by MIRI: 2.5 × 10^-14

0.01%conditional on the above, that friendly AI is possible, can be solved in time, and that it will not worsen the situation by either getting some detail wrong or by making AI more likely.

5%conditional on the above, that the Machine Intelligence Research Institute will make an important technical contribution to friendly AI.

1. Thanks! Even though I disagree with most of your probability estimates (I’d give 25%, 5%, 50%, 50%, 90%, 1% and 10%), they’re a handy equation.

2. I think you can only handle this sort of issues on strategic level. I.e. is it an useful strategy to ‘appoint’ people to direct research when they would be unable to reach such position otherwise (i.e. in academia and big government projects) due to their low level of functioning? Generally not, and especially not when there is a safety concern.

3. Many of these estimates seem quite weak; I find it hard to take them very seriously. Though I approve of criticism and I am very happy to see people thinking quantitatively about the future, I think you should be a bit more careful (I have a similar impression of much of the analysis on the blog). If you are making a good faith effort to give credences from a position of ignorance, then I don’t think your estimates should be quite so extreme. If you have thought a lot about these issues, it seems you might be laboring under unusually extreme distortions (these estimates are significantly more outlandish than even Eliezer’s, which I will readily admit seem quite outlandish).

For example: 1% on intelligence explosion in 2 decades? As far as I can tell, today a typical pattern is for the efficiency of solutions to challenging algorithmic problems to double over a few years, suggesting a factor of 1000 improvement under a not-very-surprising rate of progress. In that case you would expect a jump from “\$1M for a human substitute” to “\$1000 for a human substitute,” i.e. to go from 1 human to 1000 humans (which I think we’d grant can decisively outsmart 1 human?). This is not to mention that a it is hard to assign such a low probability to various bona fide intelligence explosion scenarios, e.g. on which AI design is unusually amenable to automation amongst cognitive tasks. It’s also not to mention the possibility of an ordinary speed-up resulting from automation of intellectual labor, which is expected under basically all ordinary models of human activity and which would occur over a much shorter timescale than 2 decades. Also, note that few experts share this view as far as I can tell (I also find their skepticism about takeoff surprising, but even they aren’t anywhere near that skeptical).

1% on information-theoretically simple AI being possible? 99% is a lot of confidence to have in a thing which you know basically nothing about. The complexity of a brain emulation is quite high, maybe that’s what we are disagreeing about (certainly well over 10^14 bits, without some understanding of the brain to do more intelligent compression, which would put 0.1% at 10^10 bits, which is absolutely astronomical for a piece of software)? Getting to a complexity that high requires a departure from the status quo just to accumulate so much useful complexity at all. Moreover, we can see many plausible approaches which would yield much simpler AI if they ended up getting there (basically every serious research program in the field, in fact), and I don’t see why you would either be so confident that none of them will nor be so confident that future work will look so unlike the status quo. I think there are again very few experts who would agree with you on this count.

0.1% on AI not being technically confined and exhibiting interest in resource acquisition, even after conditioning on AI being surprisingly simple? I don’t know which one you are giving less than 3%, but both seem completely indefensible. These are extraordinarily confident predictions about the future on the one hand, and about a technical subject which we don’t know much about on the other. Why is AI virtually certain to be uninterested in resource acquisition, given that many simple models of AI would be, as would a fair fraction of humans (even when interacting with a non-alien society)?

10% probability that humanity won’t be killed by something else first? I don’t see how conditioning on the earlier properties would cause this probability to go so high, and I’ve never heard of anyone informed giving an estimate anything like this.

4. For example: 1% on intelligence explosion in 2 decades?

What I meant is that an information theoretically simple and unobstructed AI might not require more than 2 decades after its invention to make itself superhuman ( possibly with the help of humanity). I edited the post to account for this misunderstanding.

As far as I can tell, today a typical pattern is for the efficiency of
solutions to challenging algorithmic problems to double over a few
years, suggesting a factor of 1000 improvement under a
not-very-surprising rate of progress.

There are strong diminishing returns for the most important abilities. Consider an artificial intelligence that is tasked with writing
political speeches. What would it mean to be strongly superhuman at
writing such speeches? Would an intelligence that is a million times
smarter than humans be a millions times better at writing political
speeches, and would an intelligence that is a trillion times smarter be a
million times better at writing political speeches than an intelligence
that is only a million times smarter?

In order to become better at writing political
speeches, there are basically 3 possibilities: (a) improving the
algorithms, and the improvement algorithms themselves etc. (b)
increasing the computational resources (c) obtaining empirical data.
What combination of these possibilities would cause the ability to write
political speeches to explode, and how?

How would an intelligence explosion of such a
political speech writing AI look like? Would it just throw more
computing resources at improving each sentence, and thereby improve the
sentence linearly or even exponentially? Or would it program an
algorithm that can program improved political speech writing algorithms
and concentrate on improving this algorithm? Would each improvement make
it better at further improvement?

More specifically, consider that there is an
algorithm whose computation outputs political speeches. I will call this
algorithm PSA (political-speech-algorithm). In an intelligence
explosion this algorithm would have to be improved. Which demands the
existence of an algorithm that can improve PSA. I will call this
algorithm PSAIA (PSA improvement algorithm). We could continue with an
algorithm that can improve PSAIA. Where does this end? And is this
sufficient in order to obtain strong returns from a positive feedback
loop?

1% on information-theoretically simple AI being possible? 99% is a lot
of confidence to have in a thing which you know basically nothing about.

The idea is that an intelligence could improve against instrumental
goals, such as survival. But this is not a well-defined. What does it
mean to survive? Does it mean that you care about whether the simulator
gods might be enraged by your actions? Or just if there is a fire in
your server farm? The answer is that you can’t draw such a line by means
of consequentialism. Such a line can only be drawn arbitrarily. And
evolution provided a lot of such arbitrary lines by our complex values
and psychological features as boredom.

It will be extremely hard to avoid combinatorial
explosions in AI. You need a lot of complexity to encode all the
arbitrary lines that need to be drawn in order for an AI to care about a
limited amount of possibilities. Which is also the reason for why there
are no artificial mathematicians. Mathematics may be well-defined, but
it is infinite. Without limiting the search space you will never stumble
upon interesting mathematics. And what is “interesting” has to be
defined, which is the hard part.

Why is AI virtually certain to be uninterested in resource acquisition,
given that many simple models of AI would be, as would a fair fraction
of humans (even when interacting with a non-alien society)?

See for example here or here.

10% probability that humanity won’t be killed by something else first?

If the other conditions hold then it is likely that the AI’s precursors will already be sufficiently dangerous. See e.g. here.