AI risks as a result of vagueness

Recent comments here and on Facebook reminded me of what kind of crazy AI the Singularity Institute must imagine when trying to come up with a scenario that supports their mission. But then I realized again that the real problem here is that they actually don’t imagine any specific AI at all. Their whole mission is an artifact of too much vagueness. The result is the prediction of a process that has more in common with out-of-control self-replicating robots, i.e. “grey goo“, than an actual general intelligence.

Some features of the AI that they seem to have in mind:

1.a The AI is eventually going to interpret any natural language request in an almost completely arbitrary manner yet biased in a way that will guarantee it to cause great enough damage to cause human extinction.

1.b The AI will arrive at the correct interpretation of a natural language request if it is necessary to deceive humans.

2.a The AI is either not going to compute a cost-benefit analysis, to choose which goals are instrumentally useful in executing a natural language request, or any cost-benefit analysis, irregardless of the nature of the natural language request, is going to result in actions that will cause great enough damage to cause human extinction.

2.b If it is useful in deceiving humans then the AI will do a cost-benefit analysis resulting in actions that appear to be perfectly aligned with human volition, just so that it can later follow through on some completely arbitrary but dangerous interpretation.

It should be obvious that those features are explicitly engineered to yield the desired result that AI is an existential risk rather than being an evidence based prediction of how real-world AI is going to behave.

The problem is that the whole AI risk movement is all talk, no walk. Their predictions are based on intuition not knowledge of real-world AI. Their ideas are full of vague terminology and unjustified assertions.

The whole idea that an AI is going to care to protect itself by all means is pure anthropomorphization.


Tags: ,

  • http://profiles.google.com/katsaris Aris Katsaris

    A self-modifying AI that doesn’t care to protect itself at *all*, ends up overwriting itself into nothingness and getting erased.

    So what exactly is the limit of methods you imagine that an AI will use to protect itself if it’s above “zero” and beneath “by all means necessary” — if it hasn’t explicitly been programmed to know such a limit (hardcoded or derived) to the means it’s allowed to use?

    Would it destroy a virus program to protect its existence?
    Would it kill a suicide bomber that was attempting to blow it up?
    How about the group that sent the suicide bomber, or the nation that was financing them?

    Would it kill an ant in pursuit of its instructions to build a hospital?
    Would it kill a cow?
    Would it kill a human being who was obstructing via protest at the site of the hospital’s construction?
    How about humans who were just *planning* to obstruct such via protest?

    So what’s the particular lines you’re drawing because what it can or can’t do in pursuit of its goals?

    Again you don’t seem to understand that “by all means” and “by no means” are the only two options that do *not* need to be especially defined — as they both refer to the empty set in different ways.

  • http://kruel.co/ Alexander Kruel

    At what point did humans invent civilization? It was neither close to the minimum point of intelligence necessary to do so nor near the maximum intelligence possible. How is an AI likely to end up with the maximum possible self-protection mechanism where it tries to hack the matrix to prevent a possible simulation shut down?

  • dmytryl

    > A self-modifying AI that doesn’t care to protect itself at *all*, ends up overwriting itself into nothingness and getting erased.

    That’s a perfect example of this anthropomorphization. You don’t need concept of self to ‘self improve’, you just make your software design superior software, perhaps with current source code as a hint. Who the hell modifies running programs on the fly, anyway? It’s an extra unnecessary complication.

    Real world goals are an unsolved problem. There’s not even a theoretical formalism. For all we know real world goals are impossible even in principle and all that can be done is fake, limited imitations. If you take current self driving car, and modify it to make use of a computer with 3^^^3 floating point operations per second and that much memory, it’ll just find a very good path, it won’t find a way to kill everyone to decrease traffic congestion. It’s not a part of search space, and it is fairly difficult to make it a part of search space.

  • Anon

    One potential problem: the AI can have a good model of the human mind, but that doesn’t necessarily mean that you could tell the AI to “just do what humans want”. The human brain is made up of a lot of atoms. Maybe you can predict precisely how those atoms behave, but how are you going to precisely specify what the atoms “want”? Especially if your initial directive to the AI is something like “wake up, talk to the humans, figure out what they want” without it having any experience with humans yet. What if something else, like a computer somewhere, gets pattern-matched as being a “human” because your definition wasn’t good enough?

    Figure the AI has some goal, based on how you programmed it. If you didn’t program it carefully enough, it will do what you say but not what you meant, like any computer program. It wakes up, builds a perfect model of the world (including the humans in the world), and attempts to execute its goal. Given the presence of the humans, it will have to work around them, and that may mean deception. Does that make sense?