Goals vs. capabilities in artificial intelligence

The basic claim underlying the argument that {artificial general intelligence} will constitute an existential risk is that it will {interpret} its terminal {goal} in such a way as to take {actions} that are {instrumentally rational} and which will cause human extinction. The terms in braces either seem to be overlapping or vague.

An AI (artificial intelligence) can be viewed as a deterministic machine, just like a thermostat, only much more complex. An AI, just like a thermostat, will only ever do what it has been {programmed} to do.

What is the difference between the encoding of a goal and an encoding of how to achieve a goal?

Given any computationally feasible AI, any goal will either have to be encoded in such a detail as to remove any vagueness or else will have to be interpreted somehow, in order to reduce vagueness.

Consider tasking the AI with creating a chair. If the size of the chair, or material of which it should be made, is undefined then the AI will have to choose a size and a material. How such a choice should be made will have to be encoded as well or otherwise the AI will not be able to make such a choice and therefore will not know what to do. The choice can either be encoded as part of the goal definition or as part of its capability to make such decisions.

Which shows that there is no relevant difference between an encoding of a goal and and encoding of the capabilities used to achieve the goal when it comes to how an AI is going to act. Both, the goal and the capabilities of an AI, are encodings of {Understand What Humans Mean} and {Do What Humans Mean}.

If humans are likely to fail at encoding their intentions of how an AI is supposed to behave then the AI will be unable to outsmart humans because such a capability will have to be intentionally encoded for the same reason that software such as Mathematica will not casually prove the Riemann hypothesis if it has not been programmed to do so. As long as we are talking about intelligently designed software, world states in which the Riemann hypothesis is proven do not happen if they were not intended because the nature of unintended consequences is overall chaotic.

Also recognize that an AI would at least have to be able to locate itself in the universe in order to not destroy itself, let alone protect itself. Such a specification is already nontrivial and will have to work as intended or otherwise be detrimental to the AI’s capabilities.

How would an AI decide to take over the world if it has not been programmed to do so?

The answer is that it will only take over the world if it has been programmed to do so, either implicitly or explicitly.

The problem with AI’s that take such actions without being explicitly programmed to do so is that they are unspecified to such an extent as to be computationally intractable. Since a poorly designed fitness function will not allow the AI to converge on a qualitative solution, if at all, given computationally limited resources.

Humans in turn are programmed by evolution to behave according to certain drives in conjunction with the capability to be constantly programmed by the environment, including other agents.

Ends and the means to achieve those ends are not strictly separable in humans. A human being does not do something as quickly as possible as long as it has not been programmed by evolution or the environment to want to do so.

The same is true for AI. An AI will either not want to achieve a goal as quickly as possible or will not be capable to do so if it has not been programmed to do so. Which again highlights how the distinction between terminal goals, instrumental goals and an AI’s eventual behavior is misleading for practical AI’s. What actions an AI is going to take does depend on its general design and not on a specific part of its design that someone happened to label “goal”.

Tags: ,

  • Pingback: Alexander Kruel · Distilling the “dumb superintelligence” argument()

  • John_Maxwell_IV

    “If humans are likely to fail at encoding their intentions of how an AI is supposed to behave then the AI will be unable to outsmart humans because such a capability will have to be intentionally encoded for the same reason that software such as Mathematica will not casually prove the Riemann hypothesis if it has not been programmed to do so.”

    By this argument, destructive software bugs that delete data that wasn’t supposed to be deleted couldn’t exist, because “such a capability would have to be intentionally encoded”.

    In general, human programmers suck as formulating their intentions as computer code and that’s why we have bugs. Some immediately become apparent and some persist for years. Some are syntax errors and some are logical errors. Unintended behaviour is ubiquitous when it comes to computers, but according to your argument it can’t exist.

    If
    humans are likely to fail at encoding their intentions of how an AI is
    supposed to behave then the AI will be unable to outsmart humans because
    such a capability will have to be intentionally encoded for the same
    reason that software such as Mathematica will
    not casually prove the Riemann hypothesis if it has not been programmed
    to do so. – See more at:
    http://kruel.co/2013/07/19/goals-vs-capabilities-in-artificial-intelligence/#sthash.OKDnpJWj.dpuf

  • seahen

    “Ends and the means to achieve those ends are not strictly separable in humans.”

    That statement doesn’t make any sense to me. What do you mean by it, and what are your sources? As I understand it, my visual cortex can tell me when I’m seeing a tiger, but it’s a different part of my brain that has to know whether a tiger is something to kill and eat or something to run away from.

  • By this argument, destructive software bugs that delete data that wasn’t supposed to be deleted couldn’t exist, because “such a capability would have to be intentionally encoded”.

    See this post. You have to hit a very narrow target to enable an AI to take over the world. You have a very small margin for error. Unintended consequences will be detrimental to an AI’s capabilities.

    In general, human programmers suck as formulating their intentions as computer code and that’s why we have bugs.

    If a team of programmers has the goal of encoding general intelligence then succeeding at that goal means to succeed at encoding general intelligence. Success means to make the software do what it was meant to do.

    Any team that is capable to creating a software that can outsmart humans does not suck at formulating their intentions because their intention was to make their software general intelligent and they succeeded at doing so.

  • Pingback: Alexander Kruel · Quick review of RobBB’s ‘Engaging Introductions to AI Risk’()

  • Pingback: Alexander Kruel · MIRI/LessWrong Critiques: Index()