The Fallacy of AI Drives

Skeptic: I don’t think that a sufficiently intelligent AI will constitute an existential risk.

Eschaton Foundation: An AI will only ever do what it has been explicitly programmed to do. An AI will constitute an existential risk as long as it hasn’t been explicitly programmed not to be an existential risk.

Skeptic: How exactly is an AI that was not explicitly designed not to constitute an existential going to constitute one?

Eschaton Foundation: It will take over the world in an effort to protect its goals from outside inference and to capture resources.

Skeptic: Then the solution seems to be not to explicitly design an AI to take over the world.

Eschaton Foundation: Taking over the world is an emergent drive that is implicit given any artificial general intelligence.

Skeptic: How is that goal different?

Eschaton Foundation: That goal is different because self-protection is important in order to achieve a wide range of goals.

Skeptic: You seem to assume that each goal in the range of goals you have in mind is explicitly defined to be achieved by all means. Since not even humans care to achieve any given goal by all means where would the incentive to do so come from for an artificial general intelligence given that you are correct that any given AI will only ever do what it has been explicitly programmed to do?

Eschaton Foundation: It is implicit given any utility function that an alteration of the utility function itself is worse than death as certain modifications might cause any agent equipped with the altered utility function to act in ways contrary to the value of agents equipped with the original utility function.

Skeptic: According to your claim that an AI will only ever do what it has been explicitly programmed to do you must implicitly assume that any given utility function is designed to not only assign utility to the preservation of itself but that it does so in a way that would make it rational, for any agent equipped with such a utility function, to allocate extreme amounts of resources to prevent world states where it is altered. To prevent that outcome it should therefore suffice not to design a utility function in such a way.

Eschaton Foundation: But AI’s will want to be rational and maximize expected utility, which means to favor any action that does steer the future toward outcomes that maximize the probability of achieving its goals.

Skeptic: Unbounded self-protection is only rational from a human perspective. According to your basic premise, an AI does not automatically do something that it hasn’t been explicitly programmed to do.

Let’s assume that an AI was tasked to maximize paperclips. To do so it will need information about the exact design parameters of paperclips, or otherwise it won’t be able to decide which of a virtually infinite amount of geometric shapes and material compositions it should choose. It will also have to figure out what it means to “maximize” paperclips.

How quickly, how long and how many paperclips is it meant to produce? How long are those paperclips supposed to last? Forever? When is the paperclip maximization supposed to be finished? What resources is it supposed to use?

Any imprecision, any vagueness will have to be resolved or hardcoded from the very beginning. Otherwise the AI either won’t work, e.g. by stumbling upon an undecidable problem or by getting stuck in the exploration phase and never go to exploit the larger environment.

Humans know what to do because they are not only equipped with a multitude of drives by evolution but also trained and taught what to do. An AI won’t have those information and will face the challenge of nearly infinite choice that can’t be rationally or economically determined without being given clear objectives and incentives, or the ability to arrive at the necessary details.

Without an accurate comprehension of your goals it will be impossible to maximize expected “utility”. Concepts like “efficient”, “economic” or “self-protection” all have a meaning that is inseparable with an agent’s terminal goals. If you just tell it to maximize paperclips then this can be realized in an infinite number of ways given imprecise design and goal parameters. Undergoing to explosive recursive self-improvement, taking over the universe and filling it with paperclips, is just one outcome. Why would an arbitrary mind pulled from mind-design space care to do that? Why not just wait for paperclips to arise due to random fluctuations out of a state of chaos? That wouldn’t be irrational.

“Utility” does only become well-defined if it is precisely known what it means to maximize it. The two English words “maximize paperclips” do not define how quickly and how economically it is supposed to happen.

“Utility” has to be defined. To maximize expected utility does not imply certain actions, efficiency and economic behavior, or the drive to protect yourself. You can also rationally maximize paperclips without protecting yourself if it is not part of your goal parameters. You can also assign utility to maximize paperclips as long as nothing turns you off but don’t care about being turned off.


  1. Aris Katsaris’s avatar

    “is explicitly defined to be achieved by all means.”

    ‘Achieved by all means’ isn’t a statement that needs be defined explicitly, it’s merely the absence of a restriction on what means are to be used.

    “All means” is the same with “any means” which is the same as “don’t care about the means”, and not caring about X is the default state of everything that hasn’t been programmed to care about X.

    In the real world of programming, a program that finds usefulness in soaking up memory and hasn’t been programmed to release any, will eventually be using up all memory in the system if it can. You don’t need to explicitly define ‘WITH AS MUCH MEMORY AS YOU WISH”, the programmer (and operating system) just need to have been negligent in their restrictions to it. And if the operating system or the virtual engine or the program itself don’t forbid it, it would overrun the memory held by other programs.

    Not out of malice or out of any particular instruction of “BY ALL MEANS” but just out of indifference and carelessness as to respecting the boundaries of other programs.

  2. Alexander Kruel’s avatar

    How could something that is unconstrained in such a way that it would just eat up all resources, due to the absence of restrictions, possibly ever reach the point of capability enabling it to capture all those resources?

    If the answer to how AI’s that are not explicitly defined to be friendly are unfriendly is that such AI’s just don’t care about either and therefore do X, then given the number of X that would cripple or prevent such an unbounded resource consumption, how would such an AI stumble upon that choice? Why wouldn’t it just decide to think the matter over for a few billion years? If it doesn’t care…

  3. Aris Katsaris’s avatar

    “Why wouldn’t it just decide to think the matter over for a few billion years? If it doesn’t care…”

    Well after the first month of it not doing anything, its programmers will probably tweak it to actually care about being more prompt.

    More generally, programs that become rocks early on are safe and useless: they can be dismissed from our considerations as being examples of failed AIs that didn’t actually manage to self-improve. If 100,000 attempts at self-improving AI fail, and one succeeds — the significant question is how the *one* is likely to behave, you’re not safe because there’s also those *failed* programs that are completely safe and useless.

Comments are now closed.