Human volition as a resource to resolve vagueness

(Here are two more comments from a Facebook chat. Although this has all been outlined in my post here and Richard Loosemore’s post here, rephrasing it for those who either don’t read such long posts or don’t understand them might be helpful.)

Previously: AI Risk Caveats

Omohundro’s AI drives are detached from reality because some of those drives that he mentions, like unbounded self-protection, are only rational from a human perspective. An AI does not automatically feature a drive to protect itself. Although it might seem plausible that a sufficiently intelligent AI would conclude that it can only achieve its goals if it does everything to protect its agency, this misses the point that an AI will only arrive at such a conclusion given that self-protection is either explicitly defined to be a specification of its goal or that it can be inferred from facts about the environment like human volition.

In other words, unbounded self-protection will only be an outcome if an AI was specifically designed to achieve world states by, among other actions, protecting itself in a concrete way. Either unbounded self-protection is an explicit part of the AI’s workings or it can be implicitly inferred.

What is very probable is that a drive to “take over the world” will not be an explicit part of an AI’s architecture. And here the question becomes how an AI was to conclude, or call it “care” if you like, to protect itself to the extent of “taking over the universe” or even its local neighborhood. A question which leads to the answer on how any possible artificial general intelligence is motivated to refine its “goals”, i.e. reduce vagueness.

Let’s assume that an AI was tasked to maximize paperclips. To do so it will need information about the exact design parameters of paperclips, or otherwise it won’t be able to decide which of a virtually infinite amount of geometric shapes and material compositions it should choose. It will also have to figure out what it means to “maximize” paperclips. How quickly, how long and how many paperclips is it meant to produce? How long are those paperclips supposed to last? Forever? When is the paperclip maximization supposed to be finished? What resources is it supposed to use?

Any imprecision, any vagueness will have to be resolved or hardcoded from the very beginning. Otherwise the AI either won’t work, e.g. by stumbling upon an undecidable problem or getting stuck in the exploration phase and never go to exploit the larger environment.

If the AI is explicitly build to use the environment to dissolve any vagueness. What is the most likely place to find answers about what to do? Human volition!

In short, either everything an AI is supposed to do is already explicitly hardcoded, in which case it won’t be an existential risk as long as nobody manages to explicitly make it one. Or an AI somehow has to figure out what it is meant to do. In which case it either won’t care to do so and thereby fail to work at all or it will have to look for information within the environment on what it is meant to do. In which case human volition is the obvious choice.

Humans know what to do because they are not only equipped with a multitude of drives by evolution but also trained and taught what to do. An AI won’t have those information and will face the challenge of nearly infinite choice that can’t be rationally or economically determined without being given clear objectives and incentives, or the ability to arrive at the necessary details. And the only way to remove that vagueness is to tap an adequate source of information. Which is human volition.

Tags: ,

  1. Tim Tyler’s avatar

    This seems like an argument against http://matchingpennies.com/universal_instrumental_values/ as well. Except that it doesn’t look like much of an argument to me. Universal Instrumental Values is a reasonable concept, and self-protection is fairly-obviously a U.I.V. There *are* some self-sacrificing agents out there – but we understand that they exist due to kin selection.

Comments are now closed.