Distilling the “dumb superintelligence” argument

Someone posted a distilled version of the argument that I tried to outline in some of my previous posts. In this post I try to refine the argument even further.

Note: In this post AI stands for artificial general intelligence.

(1) An AI will not be pulled at random from mind design space but instead be designed by humans.

(2) If an AI is meant to behave generally intelligent then it will have to work as intended or otherwise fail to be generally intelligent.[1]

(3) A significant part of general intelligence consists of deriving exact meaning from fuzzy meaning.[2]

(4) An AI that lacks the capacity from step 3 cannot take over the world.

(5) By step 1, what an AI is meant to do will originate from humans.

(6) If not otherwise specified, an AI will always make use of the capacity required by step 3.[3]

(7) By step 6, an AI will not be confused about what it is meant to do.[4]

(8) Therefore the intersection of the sets of “intelligently designed AIs” and “dangerous AIs” only contains those AIs which are deliberately designed to be dangerous by malicious humans.[5]


Notes

[1] An AI is the result of a research and development process. A new generation of AIs needs to be better than other products at “Understand What Humans Mean” and “Do What Humans Mean” in order to survive the research phase and subsequent market pressure.

[2] When producing a chair an AI will have to either know the specifications of the chair (such as its size or the material it is supposed to be made of) or else know how to choose a specification from an otherwise infinite set of possible specifications. Given a poorly designed fitness function, or the inability to refine its fitness function, an AI will either (a) not know what to do or (b) will not be able to converge on a qualitative solution, if at all, given limited computationally resources.

[3] An AI can be viewed as a deterministic machine, just like a thermostat, only much more complex. An AI, just like a thermostat, will only ever do what it has been programmed to do.

[4] If an AI was programmed to be generally intelligent then it would have to be programmed to be selectively stupid in order fail at doing what it was meant to do while acting generally intelligent at doing what it was not meant to do.

[5] “The two features <all-powerful superintelligence> and <cannot handle subtle concepts like “human pleasure”> are radically incompatible.”The Fallacy of Dumb Superintelligence

Further reading

An improved version of the above argument can be found here.

Tags: ,

  • Pingback: Distilling the “dumb superintelligence” argument | Chistoso Para Adsense()

  • John_Maxwell_IV

    “If an AI is meant to behave generally intelligent then it will have to work as intended or otherwise fail to be generally intelligent.”

    “Working as intended” and “behaving generally intelligent” seem like orthogonal concepts that should not both go under the same label. MIRI’s entire argument is that there exist generally intelligent AIs that don’t behave as their creators intended, and in fact fully intended behaviour is a very small target to hit. Your answer to this argument is that such AIs “won’t survive the research phase and subsequent market pressure”. This hardly seems like something to bet the species on, esp. if self-improvement curves have the wrong shape.

  • “Working as intended” and “behaving generally intelligent” seem like orthogonal concepts that should not both go under the same label.

    I don’t think so. As I outlined in my previous post, both concepts are in no relevant way different.

    MIRI’s entire argument is that there exist generally intelligent AIs that don’t behave as their creators intended, and in fact fully intended behaviour is a very small target to hit.

    The hardest part is to make an AI capable of becoming generally intelligent and then improving itself. Those capabilities will have to be intentionally programmed and work as intended.

    I don’t see any good reason to believe into the conjunction of succeeding at encoding all the capabilities that are necessary to understand humans while failing in such a way as to make the AI use those capabilities on all problems except at understanding what it is meant to do.

  • Xagor et Xavier

    If MIRI’s argument is that fully intended behaviour is a very small target to hit, I’d say the response of my distillation is that the AI already comes equipped with a capacity to make very small targets easier to hit: it’s part of what we call “intelligence”.

    So the designer can say “Do what I mean: produce paperclips”. If the AI says “I don’t know what that means”, the designer can say “Use your intelligence to understand”. Why shouldn’t the AI manage to use the same fuzzy-to-exact parsing skill it uses to find out what a paperclip is, to find out that when the human says “do what I mean”, he doesn’t want an Earth made up of paperclips?

    A bad designer could indeed hard-wire the objective function to produce a perverse paperclip-mania outcome. But there’s no reason for him to do so, and it could very well be more tedious than doing it the right way. In the nuclear plant example, a designer who restricts his AI to only building PWRs has to codify what a PWR is. A designer who leaves the AI free to infer a reactor design on its own doesn’t.

  • John_Maxwell_IV

    “I don’t think so. As I outlined in my previous post, both concepts are in no relevant way different.”

    It’s possible for an AI to work in some of the ways the creators intend but not all of them. I could write a computer program that behaves correctly when I try to use one feature but not when I try to use another (perhaps much more difficult to code) feature. In the same way, it’s possible to have a program that behaves as its creator intended in the sense that it’s generally intelligent but not in the sense that it’s friendly.

    “I don’t see any good reason to believe into the conjunction of succeeding at encoding all the capabilities that are necessary to understand humans while failing in such a way as to make the AI use those capabilities on all problems except at understanding what it is meant to do.”

    Understanding humans will fall out of trying to achieve any goal. MIRI suggests that programming the goal you want is the hard part?

    I do think there is a lot wrong with your arguments, but after thinking for a bit, I’m not sure that giving an AGI a natural-language mission statement, as you seem to suggest, is *necessarily* unfeasible.

  • John_Maxwell_IV

    Good question. Alexander’s post caused me to start thinking along the same lines. I don’t know what MIRI’s response would be to this. It seems like it could work out, but I’m not sure I’d want to bet the human race on it.

  • Xagor et Xavier

    That uncertainty regards point 2 of my distillation, I think. If the general AI isn’t intelligent enough, it could misinterpret what the human means and then disaster follows. The not-quite-intelligent-enough AI is then a counterexample to point 2. It can take over the world, but it isn’t a superintelligence.

    But having phrased it as a point 2 problem, it doesn’t look that different from other point 2 problems. The misinterpreting AI is a risk to existence for the same reason that an AI that lets its subgoals override its terminal goals is a risk.

    So it would seem the contention is that MIRI thinks it is possible for an AI to be a superintelligence yet misinterpret, while on the other hand, it is not possible for an AI to be a superintelligence and let its subgoals override its terminal goals. But I would say that, because fuzzy-to-exact parsing is part of intelligence, a greater intelligence will be better at understanding humans than a lesser intelligence. Misinterpretation is no less an “ordinary” intelligence failure than is terminal goal overwriting.

    There may still be a risk from all sorts of imperfect intelligences or non-intelligences. Gray goo and superviruses are dangerous but not at all intelligent, for instance. But there’s little reason to single out misinterpretation as somehow special.

  • It’s possible for an AI to work in some of the ways the creators intend but not all of them.

    To take over the world such an AI will have to work perfectly along a huge number of dimensions. And such a set of capabilities does not happen by chance but rather will have to be encoded by humans. Which means that the existence of such an AI requires that humans will be orders of magnitude better at encoding intentional features as they are today.

    In the same way, it’s possible to have a program that behaves as its creator intended in the sense that it’s generally intelligent but not in the sense that it’s friendly.

    But unfriendliness is a result of a misinterpretation of what it was meant to do, which is a result of either not using its capabilities correctly or being unable to use its capabilities. In both cases such a failure mode will be detrimental to its ability of taking over the world.

    Understanding humans will fall out of trying to achieve any goal. MIRI suggests that programming the goal you want is the hard part?

    Once you are capable of encoding the capability of understanding any goal you just need to give it any goal and it will figure out what you meant because understanding the world is an important feature of general intelligence without which it couldn’t be powerful in the first place. See my post, ‘To beat humans you have to define “winning”‘.

  • Pingback: Alexander Kruel · Thank you for steelmanning my arguments()

  • Pingback: Alexander Kruel · MIRI/LessWrong Critiques: Index()

  • Pingback: The seed is not the superintelligence | nothing is mere()