Being specific about AI risks

There are no alien programs. No programs are generated from random noise. All current software does obey human commands, directly or indirectly. Either those commands are hardcoded or entered later.

Mistakes are being made. Sometimes a program does something that was not intended. Often such failures result in a crash or a different kind of obstruction of the programs own workings. Yet software is constantly improved.

If software wasn’t constantly improved to be better at doing what humans intend it to do, would we then ever be able to reach a level of sophistication where a software could work well enough to outsmart us? To do so it would have to work as intended along a huge number of dimensions.

Avoid quantum leaps, be specific

Imagine some hypothetical railroad management system that “keeps the trains running”. A so called expert system. A narrow artificial intelligence. It keeps the trains on schedule. It checks that no two trains interfere with each other. It analyzes data from sensors attached to the trains that scan the rails for possible weakness or other defects. It even uses cameras to watch railroad crossings for possible obstructions. It further accepts inputs from the train personnel about possible delays or emergencies.

Now suppose the railway company wanted to improve the system and hired an artificial general intelligence (AGI) researcher to do the job.

To detect what exactly might cause the system to behave badly and to not make unwarranted assumptions, or attribute or ascribe human behavior to the process, we’ll assume that the system is improved incrementally rather than being replaced all at once.

The first upgrade is a replacement of the current mainframe with a sophisticated supercomputer. For now this upgrade has no effect since the software hasn’t been changed other than being adapted to use the new computational infrastructure.

The next upgrade concerns the input system that allowed the train personnel to submit delays and emergencies. The previous input method was to press one of two buttons, one for delay and one for emergency. The buttons have been replaced by a microphone that feeds into a sophisticated natural language interpretation module which is able to parse any delay or emergency message uttered in natural language and upon detection return the same data that would have been returned if someone had pressed the buttons instead.

Further upgrades include e.g. an advanced visual pattern recognition module that uses the camera feeds to detect possibly dangerous humans inside the trains or near the railways and notify the railway police and a drone armada roaming the railroad stations to provide service information to humans and watch for security breaches.

At some point the hired AGI researcher decides it is time to implement something more sophisticated. The program will be able to simulate other possible time tables and look for improvements based on previous delays, ticket sales and data from its other sensors such as service requests from people on the railroad stations. If it finds an improved time table it can autonomously decide to use the new time table to test it against the real world and make further improvements.


I think you can see where this is going. You can add further upgrades until the system reaches human or even superhuman capabilities. At one point it would make sense to call it an artificial general intelligence.

I spare myself from writing out further upgrades here. But feel free to continue to do so as long as you are not making any unwarranted, vague, unspecific leaps.

The fog of vagueness

It is incredible easy to simply conjecture that turning any system into, or replacing it with an artificial general intelligence will cause it to go berserk and kill all humans, kill all aliens in the observable universe, hack the matrix to prevent the simulator gods from shutting down the simulation, or give in to the first Pascal’s mugger offering it to “keep the trains running” forever. But once you have to come up with a concrete scenario and outline specifically how that is supposed to happen you’ll notice that you will never actually reach such a tipping point as long as you do not deliberately design the system to behave in such a way. 

The only way you can arrive at any scenario where an artificial general intelligence is going to kill all humans is by being vague and unspecific, by ignoring real world development processes and by using natural language to describe some sort of fantasy scenario and invoke lots of technological magic.

Don’t be fooled by AI risk advocates hiding behind vague assertions. What those people do is to cherry-pick certain science fictional capabilities of a conjectured artificial intelligence while at the same time they completely disregard the developmental stages and evolutionary processes leading up to such an intelligence.

Vagueness Explosion

Take for example the original idea of an intelligence explosion (emphasis mine):

Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultra-intelligent machine could design even better machines; there would then unquestionably be an “intelligence explosion,” and the intelligence of man would be left far behind.

— I.J. Good, “Speculations Concerning the First Ultraintelligent Machine”

The whole argument is worthless rubbish because it is unspecific and vague to an extent that allows one to draw completely unwarranted non-evidence based assumptions.

Others are better than me at explaining what is wrong here, so I’ll quote:

More generally, many of the objects demonstrated to be impossible in the previous posts in this series can appear possible as long as there is enough vagueness.  For instance, one can certainly imagine an omnipotent being provided that there is enough vagueness in the concept of what “omnipotence” means; but if one tries to nail this concept down precisely, one gets hit by the omnipotence paradox.  Similarly, one can imagine a foolproof strategy for beating the stock market (or some other zero sum game), as long as the strategy is vague enough that one cannot analyse what happens when that strategy ends up being used against itself.  Or, one can imagine the possibility of time travel as long as it is left vague what would happen if one tried to trigger the grandfather paradox.  And so forth.  The “self-defeating” aspect of these impossibility results relies heavily on precision and definiteness, which is why they can seem so strange from the perspective of vague intuition.

— Terence Tao, “The “no self-defeating object” argument, and the vagueness paradox”

Let’s try to restate I.J. Good’s original idea without some of the vagueness:

Let there be something that can far surpass all the activities of any man. Since design is one of these activities, something better could design even better; there would then unquestionably be an “explosion,” and man would be left far behind.

At best we’re left with a tautology. Nothing more specific can be drawn from the argument than that something that is better is better. No conclusions about the nature of that something can be drawn. Not if it is logical possible at all. Not if it is physical possible, let alone economically realizable. And even if it is possible within the meaning of all of the former definitions, the idea does not provide any insight about how likely it is and at what time we’re going to see an explosion, the nature of the explosion and how it is going to happen. We don’t even know how that initial something that is better is supposed to be created in the first place.

Yet it is possible to use that tautology and extent it indefinitely and use it to infer further speculative conclusions. And if someone has doubts you can just repeat that something that is better is better and the gullible will follow you in droves. But don’t get any more specific or the emptiness of your claims shall be revealed.

Tags: ,

  • dmytryl

    Something else to chew on. Let’s consider our dreaded paperclip maximizer AI, later on referred to simply as AI.

    In reality, this AI has to consist of components that, alone, are not paperclip maximizers. For example, it is useful to divide tasks into separate categories which benefit from localized processing. One region of it may work on steel paperclips, other on copper paperclips, and third on stabilization of metallic hydrogen, maybe. Subdividing all the way down to simple parts. All those parts and conglomerates have to be quite well behaved – they can’t work like regular ‘utility maximizers’ and waste time thinking of hacking each other. The well behaved artificial intelligence is thus pretty much a pre-requisite for an ill-behaved one.

    Likewise, with regards to argument about “space of random minds”. At very least, we should restrict it to minds that are made of non mind components – that’s an enormous restriction which cuts out a lot of wild but vague imagination – and if we are interested about the first minds, those that aren’t still minds if some difficult/impossible component is removed.

  • The software-improvement circle that you illustrate has a human mind as a very central part of it, to design, implement and verify the desirability of every single change. Desirability according to human values and criteria.

    Now self-improving software that do not contain this human input in its self-improvement circle need to have some corresponding element to design, implement and verify the desirability of every change.

    Currently we can actually have software somewhat improve other software using very simple and measurable metrics (reduce size of code, reduce time of execution), and using some limited amounts of methods (refactoring, etc) but we don’t have a way to make software decide what functionality would be good to be added, and whether human users would evaluate as good or bad any change in functionality.

  • Then try to add improvement capabilities, incrementally, to avoid conjecturing a magic black box. At what point does self-improvement turn dangerous?

    Say you first give it the ability to improve its natural language parsing capability by using a big data approach, watching millions of YouTube videos etc.

    At some point you implement the capability of designing better CPU designs and enable it to replace its current CPU’s with drones it controls and to later recompile its software to work with the new hardware. Would that be catastrophic?

    At another point you give it the capability to improve its CPU improvement module. Would that be catastrophic?

    The problem here is that at no point there exists a tipping-point where it would start to take over the world. Even though at some point, after a huge number of upgrades, it would be a self-improving general intelligence since it would be capable of a huge number of tasks and be able to improve each task and the software used to improve it. At which point humans wouldn’t be a central part anymore.

    So what about making software decide to add new software? You can break that down into incremental steps as well. But the important point here is that we don’t have a way to make software decide what functionality would be good to be added. And that vagueness in combination with conjecturing a superhuman intelligence that does know what functionality would be instrumentally useful to add results in the idea of a catastrophic risk where it doesn’t know what humans would evaluate as good or bad. The problem is that it doesn’t know either. It neither does know what software it could add to better pursue its goal nor what humans would want. It doesn’t work at all. Yet it is possible to just imagine, due to vagueness, such an intelligence that does know how to improve itself but doesn’t know what humans want. But trying to imagine such an agent in an incremental fashion reveals that such a scenario isn’t going to happen.

    At one point software is going to write new unique, better software. But there will be a developmental and evolutionary process leading up to that outcome. Only if you conjecture the outcome independently of its origin can you imagine a software that makes better software according to some goal that is devoid of human intention.

  • Pingback: Alexander Kruel · SI/LW Critiques: Index()

  • Pingback: Alexander Kruel · Furniture robots as an existential risk? Beware cached thoughts!()

  • The only meaningful thing I get from all this is a seeming argument that a “deontologist” approach where someone says “Thou Shalt Do This” and the exact manner in which the AI is so allowed to do it, is safer than a “consequentialist” approach where we specify what we want done and leave the AI to decide the best way about how it shall do it.

    I can certainly believe this, unfortunately you haven’t given me any reason to believe that a consequentialist AI is impossible to FOOM.

  • Pingback: Alexander Kruel · AI Doomsday Recipe()

  • Pingback: Alexander Kruel · AI drives vs. practical research and the lack of specific decision procedures()

  • Pingback: Alexander Kruel · Narrow vs. General Artificial Intelligence()

  • Pingback: Alexander Kruel · Smarter and smarter, then magic happens…()