Skeptic: I don’t think that a sufficiently intelligent AI will constitute an existential risk.
Eschaton Foundation: An AI will only ever do what it has been explicitly programmed to do. An AI will constitute an existential risk as long as it hasn’t been explicitly programmed not to be an existential risk.
Skeptic: How exactly is an AI that was not explicitly designed not to constitute an existential going to constitute one?
Eschaton Foundation: It will take over the world in an effort to protect its goals from outside inference and to capture resources.
Skeptic: Then the solution seems to be not to explicitly design an AI to take over the world.
Eschaton Foundation: Taking over the world is an emergent drive that is implicit given any artificial general intelligence.
Skeptic: How is that goal different?
Eschaton Foundation: That goal is different because self-protection is important in order to achieve a wide range of goals.
Skeptic: You seem to assume that each goal in the range of goals you have in mind is explicitly defined to be achieved by all means. Since not even humans care to achieve any given goal by all means where would the incentive to do so come from for an artificial general intelligence given that you are correct that any given AI will only ever do what it has been explicitly programmed to do?
Eschaton Foundation: It is implicit given any utility function that an alteration of the utility function itself is worse than death as certain modifications might cause any agent equipped with the altered utility function to act in ways contrary to the value of agents equipped with the original utility function.
Skeptic: According to your claim that an AI will only ever do what it has been explicitly programmed to do you must implicitly assume that any given utility function is designed to not only assign utility to the preservation of itself but that it does so in a way that would make it rational, for any agent equipped with such a utility function, to allocate extreme amounts of resources to prevent world states where it is altered. To prevent that outcome it should therefore suffice not to design a utility function in such a way.
Eschaton Foundation: But AI’s will want to be rational and maximize expected utility, which means to favor any action that does steer the future toward outcomes that maximize the probability of achieving its goals.
Skeptic: Unbounded self-protection is only rational from a human perspective. According to your basic premise, an AI does not automatically do something that it hasn’t been explicitly programmed to do.
Let’s assume that an AI was tasked to maximize paperclips. To do so it will need information about the exact design parameters of paperclips, or otherwise it won’t be able to decide which of a virtually infinite amount of geometric shapes and material compositions it should choose. It will also have to figure out what it means to “maximize” paperclips.
How quickly, how long and how many paperclips is it meant to produce? How long are those paperclips supposed to last? Forever? When is the paperclip maximization supposed to be finished? What resources is it supposed to use?
Any imprecision, any vagueness will have to be resolved or hardcoded from the very beginning. Otherwise the AI either won’t work, e.g. by stumbling upon an undecidable problem or by getting stuck in the exploration phase and never go to exploit the larger environment.
Humans know what to do because they are not only equipped with a multitude of drives by evolution but also trained and taught what to do. An AI won’t have those information and will face the challenge of nearly infinite choice that can’t be rationally or economically determined without being given clear objectives and incentives, or the ability to arrive at the necessary details.
Without an accurate comprehension of your goals it will be impossible to maximize expected “utility”. Concepts like “efficient”, “economic” or “self-protection” all have a meaning that is inseparable with an agent’s terminal goals. If you just tell it to maximize paperclips then this can be realized in an infinite number of ways given imprecise design and goal parameters. Undergoing to explosive recursive self-improvement, taking over the universe and filling it with paperclips, is just one outcome. Why would an arbitrary mind pulled from mind-design space care to do that? Why not just wait for paperclips to arise due to random fluctuations out of a state of chaos? That wouldn’t be irrational.
“Utility” does only become well-defined if it is precisely known what it means to maximize it. The two English words “maximize paperclips” do not define how quickly and how economically it is supposed to happen.
“Utility” has to be defined. To maximize expected utility does not imply certain actions, efficiency and economic behavior, or the drive to protect yourself. You can also rationally maximize paperclips without protecting yourself if it is not part of your goal parameters. You can also assign utility to maximize paperclips as long as nothing turns you off but don’t care about being turned off.

I’m a 29 year old German
Pingback: Alexander Kruel · Taking over the world to compute 1+1
Pingback: Alexander Kruel · MIRI/LW Critiques: Index