Here is a reply to the post ‘The idiot savant AI isn’t an idiot‘ which I sent Stuart Armstrong yesterday by e-Mail. Since someone has now linked to one of my posts on LessWrong I thought I would make the full reply public.
Note that the last passages have already appeared in an old post which I suspected that he has no read yet.
The problem is rooted in the claim that an AI will only ever do what it has been programmed to do in conjunction with the claim that an AI will do such things as attempting to take over the country even if it has not been programmed to do so.
Which you might explain by claiming that the latter actions do not have to be programmed because they are instrumentally rational.
That explanation raises the following question. Reasoning by analogy with what kind of AI led you to that conclusion and what makes you believe that such an AI design is likely to be build?
In particular, what makes you suspect that any AI that is eventually build will be capable of interpreting human volition in a superhuman manner if it is necessary in order to take over the world but will not be programmed to use that capability in order to do what humans want?
Which you might explain by claiming that it is difficult to program an AI to learn what humans want and do what humans want.
That explanation raises the following question. What makes you believe that the hardest part is to make an AI do what humans want rather than to understand what humans want?
In particular, what makes you distinguish understanding
Which you might explain by claiming that programming and AI to do something specific is more difficult than programming it to do something general.
That explanation raises the following question. To what extent does the general ability, speed and magnitude of self-improvement that an AI can undergo rely on the precision and complexity of the goal against which improvement can be judged empirically?
If a goal has very few constraints then the set that satisfies all constraints is very large. A vague and ambiguous goal allows for too much freedom in the sense that a wide range of world states would have the same expected value and therefore imply a very large solution space, since a wide range of AI’s will be able to achieve those world states and thereby satisfy the condition of being improved versions of their predecessor.
This means that in order to get an AI to become superhuman at all, and very quickly in particular, you will need to encode a very specific goal against which mistakes, optimization power and achievement can be judged.
Assume that the AI was tasked to maximize paperclips. To do so it will need information about the exact design parameters of paperclips, or otherwise it will not be able to decide which of a virtually infinite amount of geometric shapes and material compositions it should choose. It will also have to figure out what it means to “maximize” paperclips.
How quickly, how long and how many paperclips is it meant to produce? How long are those paperclips supposed to last? Forever? When is the paperclip maximization supposed to be finished? What resources is it supposed to use?
Any imprecision, any vagueness will have to be resolved or hardcoded from the very beginning. Otherwise the AI either will not work, e.g. by stumbling upon an undecidable problem or by getting stuck in the exploration phase and never go to exploit the larger environment.
Humans know what to do because they are not only equipped with a multitude of drives by evolution but also trained and taught what to do. An AI will not have those information and will face the challenge of nearly infinite choice that can’t be rationally or economically determined without being given clear objectives and incentives, or the ability to arrive at the necessary details.
Without an accurate comprehension of its goals it will be impossible to maximize expected “utility”. Concepts like “efficient”, “economic” or “self-protection” all have a meaning that is inseparable with an agent’s terminal goals. If you just tell it to maximize paperclips then this can be realized in an infinite number of ways given imprecise design and goal parameters. Undergoing explosive recursive self-improvement, taking over the universe and filling it with paperclips, is just one outcome. Why would an arbitrary mind pulled from mind-design space care to do that? Why not just wait for paperclips to arise due to random fluctuations out of a state of chaos? That would not be irrational.
“Utility” does only become well-defined if it is precisely known what it means to maximize it. The two English words “maximize paperclips” do not define how quickly and how economically it is supposed to happen.
“Utility” has to be defined. To maximize expected utility does not imply certain actions, efficiency and economic behavior, or the drive to protect yourself. You can also rationally maximize paperclips without protecting yourself if it is not part of your goal parameters. You can also assign utility to maximize paperclips as long as nothing turns you off but don’t care about being turned off.
- AI drives vs. practical research and the lack of specific decision procedures
- To beat humans you have to define “winning”
- Narrow vs. General Artificial Intelligence (Addendum)
- AI risk scenario: Elite Cabal
- How does a consequentialist AI work?
- Taking over the world to compute 1+1
- Being specific about AI risks
- Implicit constraints of practical goals
- The Fallacy of Dumb Superintelligence