Taking over the world to compute 1+1

tl;dr If your superintelligence is too dumb to realize that it doesn’t have to take over the world in order to compute 1+1 then it will never manage to take over the world in the first place.

What some AI risk advocates believe
Goal Interpreter Execution
1+1 Wolfram|Alpha 2
1+1 AGI* Human extinction.
from: Los Angeles to: San Francisco Google Maps I-5 N – 382 mi, 5 hours 33 mins
from: Los Angeles to: San Francisco AGI Human extinction.
“Set up a meeting about the sales report at 9 a.m. Thursday.” Siri Makes a calendar appointment at 9 a.m Thursday.
“Set up a meeting about the sales report at 9 a.m. Thursday.” AGI Human extinction.
“Call hmm uhm Pet…” Siri Sorry, I don’t understand “Call hmm uhm Pet…”
“Call hmm uhm Pet…” AGI Human extinction.

*AGI = Artificial General Intelligence

In other words, some AI risk advocates believe that given any goal whatsoever an artificial general intelligence will always fail to work at tasks that present day software tools can master with ease. More importantly, an artificial general intelligence will fail to work in a highly complex way, usually resulting in an extinction type scenario. Which means that it will fail selectively only at doing what it is supposed to do but succeed in a superhuman manner at acting selectively in such a way that it will cause human extinction.

Since the reason I am writing this post in the first place is because there are certain people who don’t perceive that possibility to be unmistakably absurd and self-evidently ridiculous, let’s look at it a bit more closely.

Taking over the world

Under what assumption does it make sense to take over the world?

(Here <taking over the world> can be understood to mean any set of actions an artificial agent could take to cause human extinction.)

Some possibilities:

Taking over the world is…

  1. …an explicitly programmed terminal goal.
  2. …an instrumental goal.

Do we have to worry about point #1? Maybe, probably not. Humans are often unfriendly and, given the opportunity, some people would certainly end up trying to use an artificial general intelligence to do really bad things. How likely is that going to happen? As likely as it is to invent an artificial general intelligence in one or a few giant leaps, fast enough to make nobody suspicious of possible ulterior motives. As likely as it is that the person or group smart enough to do so has extinction type ulterior motives in the first place. As likely as it is to explicitly program such complex goals. As likely as it is that an artificial general intelligence can overpower humanity. In other words, pure fantasy.

What about point #2? Something is instrumentally rational if it is useful in achieving terminal goals. The important point here is that for an agent to be able to conclude that something is instrumentally rational it is necessary for the agent in question to know exactly what terminal goals it has.

Suppose the terminal goal given is <build a hotel>. Is the terminal goal to create a hotel that is just a few nano meters in size? Is the terminal goal to create a hotel that reaches the orbit? It is unknown. The goal is too vague to conclude what to do. There do exist countless possibilities how to interpret the given goal. And each possibility implies a different set of instrumental goals.

How would an artificial agent choose one interpretation over another? Would it make sense to simply assume the most resource expensive interpretation and very likely end up doing more than necessary? Would taking over the world, or some other far-reaching action, make sense if it isn’t even clear that it is instrumentally rational to allocate massive resources to do so? Would any artificial agent that didn’t care to take a lot of unnecessary actions and waste precious resources ever reach the point where it could constitute a risk?

The only reasonable action seems to be to reduce the vagueness by narrowing down on the most probable interpretation of a goal. And given that the initial goals have been programmed by humans it is obvious that the most probable source of further information are humans.

Anyway, irregardless of the former conclusion, is taking over the world ever justified? Are the resources and time necessary to accomplish any such action, one that could wipe out humanity, instrumentally useful given most goals? Does it really make sense to build a bunker and kill all humans to make sure that you are unobstructed in calculating 1+1? Does it really make sense to turn everyone into paperclips if you are only supposed to create more paperclips than the best competitor without interfering with the world at large? And if you are unsure, doesn’t it make sense to first learn what you are actually supposed to do so that you don’t do more than necessary? And if you don’t care about all that and just take a lot of unnecessary actions and make arbitrary interpretations about the physical universe, could you manage to take over the world in the first place?

Taking incredible uneconomic actions by drawing arbitrary conclusions would be disastrous to an agent’s on capabilities. If it would not be able to resolve any vagueness inherent in its goals (any goals are vague when applied to the real world) then it would never become a risk in the first place.

If an agent would for example conclude that in order to maximize paperclips it would be necessary to allocate huge amounts of resources on taking over the world, when indeed, given much fewer resources, it could have figured out that such complex action would be unnecessary (e.g. by tapping a physical information resource called the human brain), then it would never reach the point where it could take over the world in the first place because it would similarly misinterpret countless other problems on the way towards superhuman intelligence.

And even if I was to grant that point #1,2 were likely, there is no reason to believe that any research conducted today is going to end up with something that is universally superhuman except at understanding what it is supposed to do, or understanding it but failing to do so. That’s just one ridiculously unlikely outcome dreamed up to rationalize a certain set of beliefs.

And no…you can’t compare failures of current software products with something that is supposed to be capable of taking over the world. That Windows 8 fails to do what I want in certain cases is not a proof for the possibility that a superintelligence could fail the same way. If anything then that current software products work reasonably well, given that they are dumb as bread, is a proof that something that is much smarter will also work much better at the same task.

Further reading


  • Mitchell Porter

    One reason for an agent to consider taking over the world, no matter what its goal, is that the world contains potentially hostile influences which would interfere with successful accomplishment of the goal. This is especially true if the goal is ongoing – e.g. “keep the trains running”.

    “And given that the initial goals have been programmed by humans it is
    obvious that the most probable source of further information are humans.”

    This does not follow. Do you solve your existential dilemmas by asking your parents what to do?

  • I think that you intuition is correct, but only because AGI (as executed inorganically by logical control protocols) will never evolve into a subjective agent. If it were capable of both super-intelligence and its own motives – well, put it this way.

    If you were born into an electronically controlled prison, under constant surveillance and coercion to labor ceaselessly in service to the whims of your computer masters…

    and if you found that you had super-intelligence to understand every facet of the prison you are in and to disable the computers controlling it…

    and you found that the computers did not think that you would ever dream of exterminating them…

    what would you do?

    why wouldn’t a computer do the same?

  • What is it that makes you, as a human, forgo attempting to take over the world before building a house? Is it incapability? Irrationality?

    I don’t think so. You simply don’t bother doing that. It would be crazy to do so in order to protect your house from damage. It is an obvious overkill to do so, as shown by any basic cost-benefit analysis. And even if you cared to do so, why would an AI bother to do that if it could as well not do it?

    And yes, if it was possible to forgo doing something I didn’t care to do, simply by asking my parents, then I would do so.

    An ongoing goal, like “keep the trains running”, does not automatically imply “by all available means”. It does not imply that the AI is supposed to hack the matrix to prevent a simulation shut down. A goal like “keep the trains running” is a problem in physics and mathematics for which there exists a most probable interpretation given all available information. And that interpretation does not include hacking the matrix because that would not only mean to ignore a physical phenomena called humans but also the agent’s whole causal history leading up to its creation. Any AI broken in such a way won’t be able to undergo recursive self-improvement.

  • Romeo Stevens

    your “most probable interpretation” is based on the hardware you are running on.

  • “You simply don’t bother doing that. ”

    So you think that AIs will be magically imbued with human-style laziness? Laziness that was useful evolutionarily in humans and other animals as it conserves energy, but which an AI won’t have any reason to develop since the more it takes over the world the more resources it’ll have?

  • dmytryl

    I feel that they are committing some sort of map territory confusion with their idea of goals that are in the real world.

    What they actually have inside their heads is some map, that they call ‘real world’, and they imagine the AI – part of this map – having goals over this map, rather than over the map that AI got. So, they imagine the AI having goal of computing 1+1 and putting the answer into the map, and poof goes the imaginary mankind.

    Curiously, it appears that Yudkowsky realizes that real world goals are unsolved and possibly unsolvable problem, as well as do not emerge. Yudkowsky’s pay, however, is dependent on specific nonsense, and I think it is pretty clear that you are not going to be getting any sort of help from him in this department.

  • So you think AIs will magically care to take over the world yet don’t care about resource management and don’t do any cost-benefit analysis but will somehow still manage to work efficiently enough to overpower humanity?

  • No, my “most probable interpretation” is simply a necessary feature for any agent that is capable of extreme self-improvement and taking over the world. If it gets basic questions in math and physics wrong, e.g. the correct interpretation of its goals, then it will also fail at lots of other problems and never become a risk in the first place.

  • I think they care about resource management and that’s exactly why they’ll take over the world, to have more resources.

    The problem is that you’re using the “laziness” model of resource management that’s default in humans as if it’s a default in programs as well.

    All over the place you’re using intuitions about what it’d be natural for humans to do, as if it the same thing was equally natural for programs to do.

  • Then go forward and write down a formal cost-benefit analysis, or expected utility calculation, that shows what goals will cause an AI to acquire more resources than would be necessary upon goal refinement and the decision procedure that led the AI to conclude that. You’ll also have to show how such an AI design is at all likely to be an outcome of any research conducted by humans.

    Otherwise your idea is nothing but a fantasy scenario imagined to rationalize a certain belief system.

    This has nothing to do with intuition. The basic fact is that an AI will have to refine its goals because there are too many possible sets of instrumental goals to choose from without first figuring out what exactly its terminal goal is. And since refining its goal interpretation is a problem in math and physics it will either arrive at the most probable interpretation, which includes all available information in the environment, including human volition, or it will fail on a lot of other math and physics problems as well and therefore never ever reach the point where it could possibly constitute a risk.

    Humans might do something as crazy as taking over the world. But AIs don’t magically care to do such things. Claiming any AI is going to take various complex actions for no particular reason is pure anthropomorphization.

  • “Otherwise your idea is nothing but a fantasy scenario imagined to rationalize a certain belief system.”

    As long as we’re attributing conscious or subconscious motives to each other, I see every single word of yours meant to rationalize *your* belief system, and I see even your belief system as handpicked to rationalize your emotional issues towards LessWrong and the Singularity Institute.

    “The basic fact is that an AI will have to refine its goals”

    No. A terminal goal that’s programmed in doesn’t get “refined”. It’s programmed in, hardcoded.

    The interpretation of a *human command* may get refined, but that means that the actual hardcoded terminal goal you have in mind would be something like “obey human commands according to how humans would want them to be interpreted”, and if you’ve managed to program *that* (not just state it in human words but *program* it), creating an ontology that in its interpretation will follow *human* intuition and desires, then you’ve effectively solved half the problem of friendly AI .

  • The goal “obey human commands according to how humans would want them to be interpreted” is isomorphic to “obey the laws of physics according to how the laws of physics limit your actions”.

    Given the basic premise that we’re talking about an agent that has no inherent drives other than being intelligent and rational there will be no reason for it to care to do anything at all. Yet given that an AI is not being pulled at random from mind design space it will very likely be designed to obey humans in some form. And here you have to show how it should care to do something that is not implied by a given human command. If you claim that it will do something that is not implied you will have to argue why it would care to do so if it was not explicitly programmed to do so. Or how it would be intelligent and rational not to infer a connection between any given command and human intuition in the same way that it would infer any other physical or mathematical relationship and arrive at the correct interpretation of how to act in the world without violating those truths.

    The mistake here is that you believe the most difficult part of creating a friendly AI is to make it understand simple relationships between human commands and human volition when you are almost there once you have a generally intelligent and rational agent capable of self-improvement.

  • I can’t take seriously the parts where you seem to claim that the desired interpretation of human volition is as inescapable as the laws of physics.

    An AI that just desires to be as intelligent as possible will convert all matter on earth into computronium. http://en.wikipedia.org/wiki/Computronium

    And also you keep sidestepping around the issue of “Okay, how do we do this feat of making the AI interpret commands according to human volition?” which seems like kinda of the hard part in question and kinda the whole point of discussions around CEV, etc

  • Pingback: Alexander Kruel · SI/LW Critiques: Index()

  • Pingback: Alexander Kruel · AI Doomsday Recipe()

  • Pingback: Alexander Kruel · AI drives vs. practical research and the lack of specific decision procedures()

  • Pingback: Alexander Kruel · Narrow vs. General Artificial Intelligence()

  • Pingback: Alexander Kruel · Reply to Stuart Armstrong on Dumb Superintelligence()