This page lists all my criticisms of artificial general intelligence as a catastrophic risk.
Introduction: A Primer On Risks From AI
For my probability estimates of unfriendly and friendly AI, see here.
AIs, Goals, and Risks
The concepts of a “terminal goal”, and of a “Do-What-I-Mean dynamic”, are fallacious. The former can’t be grounded without leading to an infinite regress. The latter erroneously makes a distinction between (a) the generally intelligent behavior of an AI, and (b) whether an AI behaves in accordance with human intentions, since generally intelligent behavior of intelligently designed machines is implemented intentionally.
Smarter and smarter, then magic happens…
(1) Present-day software is better than previous software generations at understanding and doing what humans mean.
(2) There will be future generations of software which will be better than the current generation at understanding and doing what humans mean.
(3) If there is better software, there will be even better software afterwards.
(4) Magic happens.
(5) Software will be superhuman good at understanding what humans mean but catastrophically worse than all previous generations at doing what humans mean.
On literal genies and complex goals
Imagine that advanced aliens came to Earth and removed all of your unnecessary motives, desires and drives and made you completely addicted to “znkvzvmr uhzna unccvarff”. All your complex human values are gone. All you have is this massive urge to do “znkvzvmr uhzna unccvarff”, everything else has become irrelevant. They made “znkvzvmr uhzna unccvarff” your terminal goal.
Implicit constraints of practical goals
The goal “Minimize human suffering” is, on its most basic level, a problem in physics and mathematics. Ignoring various important facts about the universe, e.g. human language and values, would be simply wrong. In the same way that it would be wrong to solve the theory of everything within the scope of cartoon physics. Any process that is broken in such a way would be unable to improve itself much.
AI vs. humanity and the lack of concrete scenarios
This post is supposed to be a preliminary outline of how to analyze concrete scenarios in which an advanced artificial general intelligence attempts to transform Earth in a catastrophic way.
What I would like AI risk advocates to publish
I would like to see AI risk advocates, or anyone who is convinced of the scary idea, to publish a paper that states concisely and mathematically (and with possible extensive references if necessary) the decision procedure that led they to devote their life to the development of friendly artificial intelligence. I want them to state numeric probability estimates and exemplify their chain of reasoning, how they came up with those numbers and not others by way of sober and evidence backed calculations. I would like to see a precise and compelling review of the methodologies AI risk advocates used to arrive at their conclusions.
Interview series on risks from AI
In 2011, Alexander Kruel (XiXiDu) started a Q&A style interview series asking various people about their perception of artificial intelligence and possible risks associated with it.
Risks from AI and Charitable Giving
In this post I just want to take a look at a few premises (P#) that need to be true simultaneously to make AI risk mitigation a wortwhile charitable cause from the point of view of someone trying to do as much good as possible by contributing money. I am going to show that the case of risks from AI is strongly conjunctive, that without a concrete and grounded understanding of AGI an abstract analysis of the issues is going to be very shaky, and that therefore AI risk mitigation is likely to be a bad choice as a charity. In other words, that which speaks in favor of AI risk mitigation does mainly consist of highly specific, conjunctive, non-evidence-backed speculations on possible bad outcomes.
Why I am skeptical of risks from AI
In principle we could build antimatter weapons capable of destroying worlds, but in practise it is much harder to accomplish.
There are many question marks when it comes to the possibility of superhuman intelligence, and many more about the possibility of recursive self-improvement. Most of the arguments in favor of those possibilities solely derive their appeal from being vague.
Objections to Coherent Extrapolated Volition
It seems to me that becoming more knowledgeable and smarter is gradually altering our utility functions. But what is it that we are approaching if the extrapolation of our volition becomes a purpose in and of itself? Extrapolating our coherent volition will distort or alter what we really value by installing a new cognitive toolkit designed to achieve an equilibrium between us and other agents with the same toolkit.
Would a singleton be a tool that we can use to get what we want or would the tool use us to do what it does, would we be modeled or would it create models, would we be extrapolating our volition or rather follow our extrapolations?
Four arguments against AI risks
I list four, not necessarily independent, caveats against AI risks that would be valid even if one was to accept (1) that AI will be invented soon enough to be decision relevant at this point in time (2) that the kind of uncontrollable recursive self-improvement imagined by AI risks advocates was even in principle possible (3) that the advantage of greater intelligence scales with the task of taking over the world in such a way that it becomes probable that an AI will succeed in doing so even given the lack of concrete scenarios on how that is supposed to happen.
AI drives vs. practical research and the lack of specific decision procedures
The objective of this post is (1) to outline how to examine the possibility of the emergence of dangerous goals in generally intelligent systems in the light of practical research and development and (2) to determine what decision procedures would cause generally intelligent systems to exhibit catastrophic side effects.
To beat humans you have to define “winning”
People who claim that artificial general intelligence is going to constitute an existential risk implicitly conjecture that whoever is going to create such an AI will know perfectly well how to formalize capabilities such as <become superhuman good at mathematics> while at the same time they will fail selectively at making it solve the mathematics they want it to solve and instead cause it to solve the mathematics that is necessary to kill all humans.
If you claim that it is possible to define the capability <become superhuman good at mathematics> then you will need a very good argument in order to support the claim that at the same time it is difficult to define goals such as <build a house> without causing human extinction.
Reply to Stuart Armstrong on Dumb Superintelligence
Here is a reply to the post ‘The idiot savant AI isn’t an idiot‘ which I sent Stuart Armstrong yesterday by e-Mail. Since someone has now linked to one of my posts on LessWrong I thought I would make the full reply public.
Distilling the “dumb superintelligence” argument
The intersection of the sets of “intelligently designed AIs” and “dangerous AIs” only contains those AIs which are deliberately designed to be dangerous by malicious humans.
Thank you for steelmanning my arguments
A further refinement of the argument against the claim that fully intended behavior is a very small target to hit.
Goals vs. capabilities in artificial intelligence
The distinction between terminal goals, instrumental goals and an AI’s eventual behavior is misleading for practical AI’s. What actions an AI is going to take does depend on its general design and not on a specific part of its design that someone happened to label “goal”.
To make your AI interpret something literally you have to define “literally”
The capability to “understand understanding correctly” is a perquisite for any AI to be capable of taking over the world. At the same time that capability will make it avoid taking over the world as long as it does not accurately reflect what it is meant to do.
Questions regarding the nanotechnology-AI-risk conjunction
Posing questions examining what I call the nanotechnology-AI-risk conjunction, by which I am referring to a scenario that is often mentioned by people concerned about the idea of an artificial general intelligence (short: AI) attaining great power.
AI risk scenario: Deceptive long-term replacement of the human workforce
Some questions about a scenario related to the possibility of an advanced artificial general intelligence (short: AI) overpowering humanity. For the purpose of this post I will label the scenario a deceptive long-term replacement of the human workforce. As with all such scenarios it makes sense to take a closer look by posing certain questions about what needs to be true in order for a given scenario to work out in practice and to be better able to estimate its probability.
AI risk scenario: Elite Cabal
Some remarks and questions about a scenario outlined by Mitchell Porter on how an existential risk scenario involving advanced artificial general intelligence might be caused by a small but powerful network of organizations working for a great power in the interest of national security.
AI risk scenario: Social engineering
Some remarks and questions about a scenario outlined in the LessWrong post ‘For FAI: Is “Molecular Nanotechnology‘ putting our best foot forward?‘ on how an artificial general intelligence (short: AI) could take control of Earth by means of social engineering, rigging elections and killing enemies.
AI risk scenario: Insect-sized drones
Some remarks and questions about a scenario outlined by Tyler Cowen in which insect-sized drones are used to kill people or to carry out terror attacks.
AI risks scenario: Biological warfare
Remarks and questions about the use of biological toxins or infectious agents by an artificial general intelligence (short: AI) to decisively weaken and eventually overpower humanity.
Realistic AI risk scenarios
Scenarios that I deem to be realistic, in which an artificial intelligence (AI) constitutes a catastrophic or existential risk (or worse), are mostly of the kind in which “unfriendly” humans use such AIs as tools facilitating the achievement of human goals.
How does a consequentialist AI work?
The idea of a consequentialist expected utility maximizer is used to infer that artificial general intelligence constitutes an existential risk.
Can we say anything specific about how such an AI could work in practice? And if we are unable to approximate a practical version of such an AI, is it then sensible to use it as a model to make predictions about the behavior of practical AI’s?
Narrow vs. General Artificial Intelligence
A comparison chart of the behavior of narrow and general artificial intelligence when supplied with the same task.
Addendum: If an artificial general intelligence was prone to commit errors on the scale of confusing goals such as “win at Jeopardy” with “kill all humans” then it would never succeed at killing all humans because it would make similar mistakes on a wide variety of problems that are necessary to solve in order to do so.
Furniture robots as an existential risk? Beware cached thoughts!
Don’t just assume vague ideas such as <explosive recursive self-improvement>, try to approach the idea in a piecewise fashion. Start out with some narrow AI such as IBM Watson or Apple’s Siri and add various hypothetical self-improvement capabilities, but avoid quantum leaps. Try to locate at what point those systems start acting in an unbounded fashion, possibly influencing the whole world in a catastrophic way. And if you manage to locate such a tipping-point then take it apart even further. Start over and take even smaller steps, be more specific. How exactly did your well-behaved expert system end up being an existential risk?
Being specific about AI risks
The only way you can arrive at any scenario where an artificial general intelligence is going to kill all humans is by being vague and unspecific, by ignoring real world development processes and by using natural language to describe some sort of fantasy scenario and invoke lots of technological magic.
Once you have to come up with a concrete scenario and outline specifically how that is supposed to happen you’ll notice that you will never actually reach such a tipping point as long as you do not deliberately design the system to behave in such a way.
Taking over the world to compute 1+1
If your superintelligence is too dumb to realize that it doesn’t have to take over the world in order to compute 1+1 then it will never manage to take over the world in the first place.
C. elegans vs. human-level AI
Reading the Wikipedia entry on Caenorhabditis elegans and how much we already understand about this small organism and its 302 neurons makes me even more skeptical of the claim that a human-level artificial intelligence (short: AI) will be created within this century.
How far is AGI?
I don’t believe that people like Jürgen Schmidhuber are a risk, apart from a very abstract possibility.
The reason is that they are unable to show off some applicable progress on a par with IBM Watson or Siri. And in the case that they claim that their work relies on a single mathematical breakthrough, I doubt that it would be justifiedeven in principle to be confident in that prediction.
The argument from the gap between chimpanzees and humans is interesting but can not be used to extrapolate onwards from human general intelligence.
The Fallacy of AI Drives
I don’t think that a sufficiently intelligent AI will constitute an existential risk.
Description of an AI risk scenario by analogy with nanotechnology
Framed in terms of nanofactories, here is my understanding of a scenario imagined by certain AI risk advocates, in which an artificial general intelligence (AGI) causes human extinction.
Discussion about catastrophic risks from artificial intelligence
A discussion about risks associated with artificial general intelligence, mainly between myself, Richard Loosemore, and Robby Bensinger.
Third Party Links