Reply to Stuart Armstrong on Dumb Superintelligence

Here is a reply to the post ‘The idiot savant AI isn’t an idiot‘ which I sent Stuart Armstrong yesterday by e-Mail. Since someone has now linked to one of my posts on LessWrong I thought I would make the full reply public.

Note that the last passages have already appeared in an old post which I suspected that he has no read yet.


The problem is rooted in the claim that an AI will only ever do what it has been programmed to do in conjunction with the claim that an AI will do such things as attempting to take over the country even if it has not been programmed to do so.

Which you might explain by claiming that the latter actions do not have to be programmed because they are instrumentally rational.

That explanation raises the following question. Reasoning by analogy with what kind of AI led you to that conclusion and what makes you believe that such an AI design is likely to be build?

In particular, what makes you suspect that any AI that is eventually build will be capable of interpreting human volition in a superhuman manner if it is necessary in order to take over the world but will not be programmed to use that capability in order to do what humans want?

Which you might explain by claiming that it is difficult to program an AI to learn what humans want and do what humans want.

That explanation raises the following question. What makes you believe that the hardest part is to make an AI do what humans want rather than to understand what humans want?

In particular, what makes you distinguish understanding from doing? The capability of recursive self-improvement that allows your hypothetical AI to become superhuman good at mathematics and human deception is an intentional feature that it was equipped with by humans. If your AI is supposed to be able to outsmart humans then humans have to succeed at implementing that capability as intended. But if humans are capable of doing so, of encoding the mathematics of becoming superhuman, then how could they at the same time fail at making it use those capabilities in order to do what humans want when becoming superhuman is part of what humans want, which as a prerequisite they succeeded to implement perfectly?

Which you might explain by claiming that programming and AI to do something specific is more difficult than programming it to do something general.

That explanation raises the following question. To what extent does the general ability, speed and magnitude of self-improvement that an AI can undergo rely on the precision and complexity of the goal against which improvement can be judged empirically?

If a goal has very few constraints then the set that satisfies all constraints is very large. A vague and ambiguous goal allows for too much freedom in the sense that a wide range of world states would have the same expected value and therefore imply a very large solution space, since a wide range of AI’s will be able to achieve those world states and thereby satisfy the condition of being improved versions of their predecessor.

This means that in order to get an AI to become superhuman at all, and very quickly in particular, you will need to encode a very specific goal against which mistakes, optimization power and achievement can be judged.

Assume that the AI was tasked to maximize paperclips. To do so it will need information about the exact design parameters of paperclips, or otherwise it will not be able to decide which of a virtually infinite amount of geometric shapes and material compositions it should choose. It will also have to figure out what it means to “maximize” paperclips.

How quickly, how long and how many paperclips is it meant to produce? How long are those paperclips supposed to last? Forever? When is the paperclip maximization supposed to be finished? What resources is it supposed to use?

Any imprecision, any vagueness will have to be resolved or hardcoded from the very beginning. Otherwise the AI either will not work, e.g. by stumbling upon an undecidable problem or by getting stuck in the exploration phase and never go to exploit the larger environment.

Humans know what to do because they are not only equipped with a multitude of drives by evolution but also trained and taught what to do. An AI will not have those information and will face the challenge of nearly infinite choice that can’t be rationally or economically determined without being given clear objectives and incentives, or the ability to arrive at the necessary details.

Without an accurate comprehension of its goals it will be impossible to maximize expected “utility”. Concepts like “efficient”, “economic” or “self-protection” all have a meaning that is inseparable with an agent’s terminal goals. If you just tell it to maximize paperclips then this can be realized in an infinite number of ways given imprecise design and goal parameters. Undergoing explosive recursive self-improvement, taking over the universe and filling it with paperclips, is just one outcome. Why would an arbitrary mind pulled from mind-design space care to do that? Why not just wait for paperclips to arise due to random fluctuations out of a state of chaos? That would not be irrational.

“Utility” does only become well-defined if it is precisely known what it means to maximize it. The two English words “maximize paperclips” do not define how quickly and how economically it is supposed to happen.

“Utility” has to be defined. To maximize expected utility does not imply certain actions, efficiency and economic behavior, or the drive to protect yourself. You can also rationally maximize paperclips without protecting yourself if it is not part of your goal parameters. You can also assign utility to maximize paperclips as long as nothing turns you off but don’t care about being turned off.

Further reading:

Tags: ,

  • Pingback: Alexander Kruel · MIRI/LessWrong Critiques: Index()

  • Xagor et Xavier

    I think the dumb superintelligence argument can be distilled further:

    1. A significant part of intelligence consists of deriving exact meaning from fuzzy meaning.

    2. A general purpose intelligence that lacks this capacity cannot take over the world.

    3. A general purpose intelligence that has this capacity will not be confused about its goals, were the goals given by a human, because it can infer what is rigorously meant by the fuzzy english the human provides.

    4. Therefore, the intersection of sets “dangerous superintelligences” (as per 2) and “golemic AIs” (as per being overly literal in 3) is the empty set.

    So one can get at it from the other direction and say that any GAI that’s powerful because of its intelligence would also understand, or know how to gain the understanding of, what is meant when the human says “produce paperclips”.

    Or by way of analogy: perhaps one has a fear that any AI set to turn uranium into electrical energy will produce lots of radioactive waste. Sure it will do so if it’s told outright that it can only build once-through PWRs. But if the designer instead relies on the artificial intelligence’s… intelligence to find out the best reactor design, then most likely not. In the same way, a paperclip maximizer might turn the Earth into paperclips if constrained (told outright) to optimize without regards to anything else. But it’s much better to just let the AI’s intelligence find out what the human means. And that doesn’t require any Friendliness magic, just ordinary intellectual capacity to go from fuzzy meanings to strict meanings, a capacity the AI will need anyway for other reasons.

    As for the distilled argument above, the most likely attack is against point 2. After all, gray goo can take over the world without having a shred of intelligence, so a faulty GAI might also be both golemic and powerful enough. But then that power won’t come from its intelligence; and its existential risk won’t come from its intelligence, either.

  • seahen

    “In particular, what makes you distinguish understanding from doing?”

    The obvious answer would be that it’s the difference between a black-box specification (which doesn’t necessarily even prove computability) and an executable implementation. To firm up your argument, you might want to say something about the AI’s need for an internal representation that’s compilable, and maybe also argue from LW’s apparent axiom that the AI would rely on Bostromian simulations of humans to predict our behaviour.

  • Pingback: Alexander Kruel · Distilling the “dumb superintelligence” argument()

  • Tim Tyler

    Note that this isn’t an argument against universal instrumental values. It just argues that such values could be overridden by an appropriately-designed utility function. I think that all parties involved are already aware of that possibility.

  • Pingback: Alexander Kruel · Quick review of RobBB’s ‘Engaging Introductions to AI Risk’()