tl;dr If your superintelligence is too dumb to realize that it doesn’t have to take over the world in order to compute 1+1 then it will never manage to take over the world in the first place.
|from: Los Angeles to: San Francisco||Google Maps||I-5 N – 382 mi, 5 hours 33 mins|
|from: Los Angeles to: San Francisco||AGI||Human extinction.|
|“Set up a meeting about the sales report at 9 a.m. Thursday.”||Siri||Makes a calendar appointment at 9 a.m Thursday.|
|“Set up a meeting about the sales report at 9 a.m. Thursday.”||AGI||Human extinction.|
|“Call hmm uhm Pet…”||Siri||Sorry, I don’t understand “Call hmm uhm Pet…”|
|“Call hmm uhm Pet…”||AGI||Human extinction.|
*AGI = Artificial General Intelligence
In other words, some AI risk advocates believe that given any goal whatsoever an artificial general intelligence will always fail to work at tasks that present day software tools can master with ease. More importantly, an artificial general intelligence will fail to work in a highly complex way, usually resulting in an extinction type scenario. Which means that it will fail selectively only at doing what it is supposed to do but succeed in a superhuman manner at acting selectively in such a way that it will cause human extinction.
Since the reason I am writing this post in the first place is because there are certain people who don’t perceive that possibility to be unmistakably absurd and self-evidently ridiculous, let’s look at it a bit more closely.
Taking over the world
Under what assumption does it make sense to take over the world?
(Here <taking over the world> can be understood to mean any set of actions an artificial agent could take to cause human extinction.)
Taking over the world is…
- …an explicitly programmed terminal goal.
- …an instrumental goal.
Do we have to worry about point #1? Maybe, probably not. Humans are often unfriendly and, given the opportunity, some people would certainly end up trying to use an artificial general intelligence to do really bad things. How likely is that going to happen? As likely as it is to invent an artificial general intelligence in one or a few giant leaps, fast enough to make nobody suspicious of possible ulterior motives. As likely as it is that the person or group smart enough to do so has extinction type ulterior motives in the first place. As likely as it is to explicitly program such complex goals. As likely as it is that an artificial general intelligence can overpower humanity. In other words, pure fantasy.
What about point #2? Something is instrumentally rational if it is useful in achieving terminal goals. The important point here is that for an agent to be able to conclude that something is instrumentally rational it is necessary for the agent in question to know exactly what terminal goals it has.
Suppose the terminal goal given is <build a hotel>. Is the terminal goal to create a hotel that is just a few nano meters in size? Is the terminal goal to create a hotel that reaches the orbit? It is unknown. The goal is too vague to conclude what to do. There do exist countless possibilities how to interpret the given goal. And each possibility implies a different set of instrumental goals.
How would an artificial agent choose one interpretation over another? Would it make sense to simply assume the most resource expensive interpretation and very likely end up doing more than necessary? Would taking over the world, or some other far-reaching action, make sense if it isn’t even clear that it is instrumentally rational to allocate massive resources to do so? Would any artificial agent that didn’t care to take a lot of unnecessary actions and waste precious resources ever reach the point where it could constitute a risk?
The only reasonable action seems to be to reduce the vagueness by narrowing down on the most probable interpretation of a goal. And given that the initial goals have been programmed by humans it is obvious that the most probable source of further information are humans.
Anyway, irregardless of the former conclusion, is taking over the world ever justified? Are the resources and time necessary to accomplish any such action, one that could wipe out humanity, instrumentally useful given most goals? Does it really make sense to build a bunker and kill all humans to make sure that you are unobstructed in calculating 1+1? Does it really make sense to turn everyone into paperclips if you are only supposed to create more paperclips than the best competitor without interfering with the world at large? And if you are unsure, doesn’t it make sense to first learn what you are actually supposed to do so that you don’t do more than necessary? And if you don’t care about all that and just take a lot of unnecessary actions and make arbitrary interpretations about the physical universe, could you manage to take over the world in the first place?
Taking incredible uneconomic actions by drawing arbitrary conclusions would be disastrous to an agent’s on capabilities. If it would not be able to resolve any vagueness inherent in its goals (any goals are vague when applied to the real world) then it would never become a risk in the first place.
If an agent would for example conclude that in order to maximize paperclips it would be necessary to allocate huge amounts of resources on taking over the world, when indeed, given much fewer resources, it could have figured out that such complex action would be unnecessary (e.g. by tapping a physical information resource called the human brain), then it would never reach the point where it could take over the world in the first place because it would similarly misinterpret countless other problems on the way towards superhuman intelligence.
And even if I was to grant that point #1,2 were likely, there is no reason to believe that any research conducted today is going to end up with something that is universally superhuman except at understanding what it is supposed to do, or understanding it but failing to do so. That’s just one ridiculously unlikely outcome dreamed up to rationalize a certain set of beliefs.
And no…you can’t compare failures of current software products with something that is supposed to be capable of taking over the world. That Windows 8 fails to do what I want in certain cases is not a proof for the possibility that a superintelligence could fail the same way. If anything then that current software products work reasonably well, given that they are dumb as bread, is a proof that something that is much smarter will also work much better at the same task.
- The Fallacy of AI Drives
- Implicit constraints of practical goals