Thank you for steelmanning my arguments

Related to: Distilling the “dumb superintelligence” argument

To steelman: the act of figuring out even better arguments for your opponents’ positions while arguing with them and to beat those arguments rather than only their actual arguments or their weakest arguments (weak-manning) or caricatures of their arguments (straw-manning). [source]

Someone called Xagor et Xavier again commented on one of my posts with a better and more concise formulation of my some of my arguments. If that person believes those arguments to be flawed (I do not know if they do) then that would increase my confidence in being wrong, since in order to rephrase my arguments more clearly they obviously have to understand what I am arguing. But at the same time I am also confident that much smarter people than me, especially experts, could think of much stronger arguments against the case outlined by some AI risk advocates.

My own attempt at steelmanning the arguments of AI risk advocates can be found in my primer on risks from AI.

In this post I attempt to improve upon the refinement of the “dumb superintelligence” argument outlined in my last post.

Argument: Fully intended behavior is a very small target to hit.


(1) General intelligence is a very small target to hit, requiring a very small margin of error.

(2) Intelligently designed systems do not behave intelligently as a result of unintended consequences.[1]

(3) By step 1 and 2, for an AI to be able to outsmart humans, humans will have to intend to make an AI capable of outsmarting them and succeed at encoding their intention of making it outsmart them.

(4) Intelligence is instrumentally useful because it enables a system to hit smaller targets in larger and less structured spaces.[2]

(5) In order to take over the world a system will have to be able to hit a lot of small targets in very large and unstructured spaces.

(6) The intersection of the sets of “AIs in mind design space” and “the first probable AIs to be expected in the near future” contains almost exclusively those AIs that will be designed by humans.

(7) By step 6, what an AI is meant to do will very likely originate from humans.

(8) It is easier to create an AI that applies its intelligence generally than to create an AI that only uses its intelligence selectively.[3]

(9) An AI equipped with the capabilities required by step 5, given step 7 and 8, will very likely not be confused about what it is meant to do if it was not meant to be confused.

(10) Therefore the intersection of the sets of “AIs designed by humans” and “dangerous AIs” only contains almost exclusively those AIs which are deliberately designed to be dangerous by malicious humans.


[1] Software such as Mathematica will not casually prove the Riemann hypothesis if it has not been programmed to do so. Given intelligently designed software, world states in which the Riemann hypothesis is proven will not be achieved if they were not intended because the nature of unintended consequences is overall chaotic.

[2] As the intelligence of a system increases the precision of the input, that is necessary to make the system do what humans mean it to do, decreases. For example, systems such as IBM Watson or Apple’s Siri do what humans mean them to do when fed with a wide range of natural language inputs. While less intelligent systems such as compilers or Google Maps need very specific inputs in order to satisfy human intentions. Increasing the intelligence of Google Maps will enable it to satisfy human intentions by parsing less specific commands.

[3] For an AI to misinterpret what it is meant to do it would have to selectively suspend using its ability to derive exact meaning from fuzzy meaning, which is a significant part of general intelligence. This would require its creators to restrict their AI and specify an alternative way to learn what it is meant to do (which takes additional, intentional effort). Because an AI that does not know what it is meant to do, and which is not allowed to use its intelligence to learn what it is meant to do, would have to choose its actions from an infinite set of possible actions. Such a poorly designed AI will either (a) not do anything at all or (b) will not be able to decide what to do before the heat death of the universe, given limited computationally resources. Such a poorly designed AI will not even be able to decide if trying to acquire unlimited computationally resources was instrumentally rational because it will be unable to decide if the actions that are required to acquire those resources might be instrumentally irrational from the perspective of what it is meant to do.

Tags: ,

  1. Xagor et Xavier’s avatar

    I intended my reply to be a one-off – that’s why I didn’t register on Disqus – but then I kept writing 🙂 My own position is neither quite yours nor quite MIRI’s, but I do think I am closer to yours than to MIRI’s. I was looking through webpages while searching for something completely unrelated, and found a page listing the latest Lesswrong posts. Upon seeing the paperclip optimization claim yet again, that sort of broke the camel’s back. I thought I would write something arguing against MIRI’s idea that misinterpretation is a risk separate from intelligence risks… and so I ended up distilling your argument on your blog. I didn’t go directly to Lesswrong, since I have learned from internet debate that trying to argue a community’s fundamental positions can be quite unpleasant. I don’t know if Lesswrong or MIRI is like that, but I just don’t want to take the chance.

    But I don’t think your argument is bullet-proof either, so I can still steel-man. I don’t think market success will completely reliably filter out the kind of intelligences that could be a risk. To make this point, I will use an analogy: organizations. To take a leaf out of Helios’ book (from Deus Ex), you might call organizations industrial-age AIs that use humans as parts. They’re not very good, and they’re certainly not superintelligences in the sense MIRI uses the word. Yet they can be more powerful than individual humans. Furthermore, the presence of state regulation suggests that the organizations, left to their own devices, will start to veer off course pretty quickly. Even the closest we have to chartered organizations, single-purpose governmental agencies and Quangos, can start to pursue their own goals.

    The point is this: we can’t know that the intelligences that will be produced will be close enough to general AI that we’re sure they can keep track of their goals and only their goals. Perhaps they’re not very good at inferring exact meaning from fuzzy meaning, or they have some glitch or internal dynamic that makes them not listen to the original goal (the way organizations have principal-agent problems). The danger is that such a somewhat-general intelligence will seem harmless enough the way small organizations may seem harmless, that it’ll be given capacity to self-modify, and then self-modify up to say, a maximum limit of 50x human capacity and go off-piste. Or it might not even improve its intelligence. Perhaps it just reproduces itself until it has great material power, and then its problems become apparent. That would be more like the gray goo exception in my first distillation. The machine isn’t a general AI, but its combination of intelligence and brute force is still sufficiently great to overpower humans.

    Inherent in this is an idea that intelligence grants an asymptotically greater ability to control the environment. If the constant factors for two different machines are so that the less intelligent machine overpowers the more intelligent one before the latter can get an asymptotic advantage, then you’re SOL. Or by analogy: humans might, given enough time, manage to mop up any gray goo disaster since humans have greater intelligence. But if the humans are turned into more gray goo before they can put their plan into action, they’re still equally dead.

    That’s how I’d argue if I wished to argue that AI research can lead to existential risk. If asked how to mitigate that risk, I would reply by favoring a bootstrapping of general intelligence: we improve machine intelligence, then we use the machines to improve our own, then we improve the machines further, and so on. At each step, we should have a great number of people at the previous intelligence level for oversight. The oversight keeps the process anchored until intelligence itself is good enough to keep it from going off course.

    If general intelligence only requires a single smart algorithm that can then properly expand itself, then there’s no need for the above. But if general intelligence is scruffy, then at the very least, one should limit its capacity to act on the world so that the constant factor doesn’t overpower the asymptote.

Comments are now closed.