Posts related to AI risks and rationality.

Related to: Distilling the “dumb superintelligence” argument

To steelman: the act of figuring out even better arguments for your opponents’ positions while arguing with them and to beat those arguments rather than only their actual arguments or their weakest arguments (weak-manning) or caricatures of their arguments (straw-manning). [source]

Someone called Xagor et Xavier again commented on one of my posts with a better and more concise formulation of my some of my arguments. If that person believes those arguments to be flawed (I do not know if they do) then that would increase my confidence in being wrong, since in order to rephrase my arguments more clearly they obviously have to understand what I am arguing. But at the same time I am also confident that much smarter people than me, especially experts, could think of much stronger arguments against the case outlined by some AI risk advocates.

My own attempt at steelmanning the arguments of AI risk advocates can be found in my primer on risks from AI.

In this post I attempt to improve upon the refinement of the “dumb superintelligence” argument outlined in my last post.

Argument: Fully intended behavior is a very small target to hit.


(1) General intelligence is a very small target to hit, requiring a very small margin of error.

(2) Intelligently designed systems do not behave intelligently as a result of unintended consequences.[1]

(3) By step 1 and 2, for an AI to be able to outsmart humans, humans will have to intend to make an AI capable of outsmarting them and succeed at encoding their intention of making it outsmart them.

(4) Intelligence is instrumentally useful because it enables a system to hit smaller targets in larger and less structured spaces.[2]

(5) In order to take over the world a system will have to be able to hit a lot of small targets in very large and unstructured spaces.

(6) The intersection of the sets of “AIs in mind design space” and “the first probable AIs to be expected in the near future” contains almost exclusively those AIs that will be designed by humans.

(7) By step 6, what an AI is meant to do will very likely originate from humans.

(8) It is easier to create an AI that applies its intelligence generally than to create an AI that only uses its intelligence selectively.[3]

(9) An AI equipped with the capabilities required by step 5, given step 7 and 8, will very likely not be confused about what it is meant to do if it was not meant to be confused.

(10) Therefore the intersection of the sets of “AIs designed by humans” and “dangerous AIs” only contains almost exclusively those AIs which are deliberately designed to be dangerous by malicious humans.


[1] Software such as Mathematica will not casually prove the Riemann hypothesis if it has not been programmed to do so. Given intelligently designed software, world states in which the Riemann hypothesis is proven will not be achieved if they were not intended because the nature of unintended consequences is overall chaotic.

[2] As the intelligence of a system increases the precision of the input, that is necessary to make the system do what humans mean it to do, decreases. For example, systems such as IBM Watson or Apple’s Siri do what humans mean them to do when fed with a wide range of natural language inputs. While less intelligent systems such as compilers or Google Maps need very specific inputs in order to satisfy human intentions. Increasing the intelligence of Google Maps will enable it to satisfy human intentions by parsing less specific commands.

[3] For an AI to misinterpret what it is meant to do it would have to selectively suspend using its ability to derive exact meaning from fuzzy meaning, which is a significant part of general intelligence. This would require its creators to restrict their AI and specify an alternative way to learn what it is meant to do (which takes additional, intentional effort). Because an AI that does not know what it is meant to do, and which is not allowed to use its intelligence to learn what it is meant to do, would have to choose its actions from an infinite set of possible actions. Such a poorly designed AI will either (a) not do anything at all or (b) will not be able to decide what to do before the heat death of the universe, given limited computationally resources. Such a poorly designed AI will not even be able to decide if trying to acquire unlimited computationally resources was instrumentally rational because it will be unable to decide if the actions that are required to acquire those resources might be instrumentally irrational from the perspective of what it is meant to do.

Tags: ,

Someone posted a distilled version of the argument that I tried to outline in some of my previous posts. In this post I try to refine the argument even further.

Note: In this post AI stands for artificial general intelligence.

(1) An AI will not be pulled at random from mind design space but instead be designed by humans.

(2) If an AI is meant to behave generally intelligent then it will have to work as intended or otherwise fail to be generally intelligent.[1]

(3) A significant part of general intelligence consists of deriving exact meaning from fuzzy meaning.[2]

(4) An AI that lacks the capacity from step 3 cannot take over the world.

(5) By step 1, what an AI is meant to do will originate from humans.

(6) If not otherwise specified, an AI will always make use of the capacity required by step 3.[3]

(7) By step 6, an AI will not be confused about what it is meant to do.[4]

(8) Therefore the intersection of the sets of “intelligently designed AIs” and “dangerous AIs” only contains those AIs which are deliberately designed to be dangerous by malicious humans.[5]


[1] An AI is the result of a research and development process. A new generation of AIs needs to be better than other products at “Understand What Humans Mean” and “Do What Humans Mean” in order to survive the research phase and subsequent market pressure.

[2] When producing a chair an AI will have to either know the specifications of the chair (such as its size or the material it is supposed to be made of) or else know how to choose a specification from an otherwise infinite set of possible specifications. Given a poorly designed fitness function, or the inability to refine its fitness function, an AI will either (a) not know what to do or (b) will not be able to converge on a qualitative solution, if at all, given limited computationally resources.

[3] An AI can be viewed as a deterministic machine, just like a thermostat, only much more complex. An AI, just like a thermostat, will only ever do what it has been programmed to do.

[4] If an AI was programmed to be generally intelligent then it would have to be programmed to be selectively stupid in order fail at doing what it was meant to do while acting generally intelligent at doing what it was not meant to do.

[5] “The two features <all-powerful superintelligence> and <cannot handle subtle concepts like “human pleasure”> are radically incompatible.”The Fallacy of Dumb Superintelligence

Further reading

An improved version of the above argument can be found here.

Tags: ,

The basic claim underlying the argument that {artificial general intelligence} will constitute an existential risk is that it will {interpret} its terminal {goal} in such a way as to take {actions} that are {instrumentally rational} and which will cause human extinction. The terms in braces either seem to be overlapping or vague.

An AI (artificial intelligence) can be viewed as a deterministic machine, just like a thermostat, only much more complex. An AI, just like a thermostat, will only ever do what it has been {programmed} to do.

What is the difference between the encoding of a goal and an encoding of how to achieve a goal?

Given any computationally feasible AI, any goal will either have to be encoded in such a detail as to remove any vagueness or else will have to be interpreted somehow, in order to reduce vagueness.

Consider tasking the AI with creating a chair. If the size of the chair, or material of which it should be made, is undefined then the AI will have to choose a size and a material. How such a choice should be made will have to be encoded as well or otherwise the AI will not be able to make such a choice and therefore will not know what to do. The choice can either be encoded as part of the goal definition or as part of its capability to make such decisions.

Which shows that there is no relevant difference between an encoding of a goal and and encoding of the capabilities used to achieve the goal when it comes to how an AI is going to act. Both, the goal and the capabilities of an AI, are encodings of {Understand What Humans Mean} and {Do What Humans Mean}.

If humans are likely to fail at encoding their intentions of how an AI is supposed to behave then the AI will be unable to outsmart humans because such a capability will have to be intentionally encoded for the same reason that software such as Mathematica will not casually prove the Riemann hypothesis if it has not been programmed to do so. As long as we are talking about intelligently designed software, world states in which the Riemann hypothesis is proven do not happen if they were not intended because the nature of unintended consequences is overall chaotic.

Also recognize that an AI would at least have to be able to locate itself in the universe in order to not destroy itself, let alone protect itself. Such a specification is already nontrivial and will have to work as intended or otherwise be detrimental to the AI’s capabilities.

How would an AI decide to take over the world if it has not been programmed to do so?

The answer is that it will only take over the world if it has been programmed to do so, either implicitly or explicitly.

The problem with AI’s that take such actions without being explicitly programmed to do so is that they are unspecified to such an extent as to be computationally intractable. Since a poorly designed fitness function will not allow the AI to converge on a qualitative solution, if at all, given computationally limited resources.

Humans in turn are programmed by evolution to behave according to certain drives in conjunction with the capability to be constantly programmed by the environment, including other agents.

Ends and the means to achieve those ends are not strictly separable in humans. A human being does not do something as quickly as possible as long as it has not been programmed by evolution or the environment to want to do so.

The same is true for AI. An AI will either not want to achieve a goal as quickly as possible or will not be capable to do so if it has not been programmed to do so. Which again highlights how the distinction between terminal goals, instrumental goals and an AI’s eventual behavior is misleading for practical AI’s. What actions an AI is going to take does depend on its general design and not on a specific part of its design that someone happened to label “goal”.

Tags: ,

Here is a reply to the post ‘The idiot savant AI isn’t an idiot‘ which I sent Stuart Armstrong yesterday by e-Mail. Since someone has now linked to one of my posts on LessWrong I thought I would make the full reply public.

Note that the last passages have already appeared in an old post which I suspected that he has no read yet.

The problem is rooted in the claim that an AI will only ever do what it has been programmed to do in conjunction with the claim that an AI will do such things as attempting to take over the country even if it has not been programmed to do so.

Which you might explain by claiming that the latter actions do not have to be programmed because they are instrumentally rational.

That explanation raises the following question. Reasoning by analogy with what kind of AI led you to that conclusion and what makes you believe that such an AI design is likely to be build?

In particular, what makes you suspect that any AI that is eventually build will be capable of interpreting human volition in a superhuman manner if it is necessary in order to take over the world but will not be programmed to use that capability in order to do what humans want?

Which you might explain by claiming that it is difficult to program an AI to learn what humans want and do what humans want.

That explanation raises the following question. What makes you believe that the hardest part is to make an AI do what humans want rather than to understand what humans want?

In particular, what makes you distinguish understanding from doing? The capability of recursive self-improvement that allows your hypothetical AI to become superhuman good at mathematics and human deception is an intentional feature that it was equipped with by humans. If your AI is supposed to be able to outsmart humans then humans have to succeed at implementing that capability as intended. But if humans are capable of doing so, of encoding the mathematics of becoming superhuman, then how could they at the same time fail at making it use those capabilities in order to do what humans want when becoming superhuman is part of what humans want, which as a prerequisite they succeeded to implement perfectly?

Which you might explain by claiming that programming and AI to do something specific is more difficult than programming it to do something general.

That explanation raises the following question. To what extent does the general ability, speed and magnitude of self-improvement that an AI can undergo rely on the precision and complexity of the goal against which improvement can be judged empirically?

If a goal has very few constraints then the set that satisfies all constraints is very large. A vague and ambiguous goal allows for too much freedom in the sense that a wide range of world states would have the same expected value and therefore imply a very large solution space, since a wide range of AI’s will be able to achieve those world states and thereby satisfy the condition of being improved versions of their predecessor.

This means that in order to get an AI to become superhuman at all, and very quickly in particular, you will need to encode a very specific goal against which mistakes, optimization power and achievement can be judged.

Assume that the AI was tasked to maximize paperclips. To do so it will need information about the exact design parameters of paperclips, or otherwise it will not be able to decide which of a virtually infinite amount of geometric shapes and material compositions it should choose. It will also have to figure out what it means to “maximize” paperclips.

How quickly, how long and how many paperclips is it meant to produce? How long are those paperclips supposed to last? Forever? When is the paperclip maximization supposed to be finished? What resources is it supposed to use?

Any imprecision, any vagueness will have to be resolved or hardcoded from the very beginning. Otherwise the AI either will not work, e.g. by stumbling upon an undecidable problem or by getting stuck in the exploration phase and never go to exploit the larger environment.

Humans know what to do because they are not only equipped with a multitude of drives by evolution but also trained and taught what to do. An AI will not have those information and will face the challenge of nearly infinite choice that can’t be rationally or economically determined without being given clear objectives and incentives, or the ability to arrive at the necessary details.

Without an accurate comprehension of its goals it will be impossible to maximize expected “utility”. Concepts like “efficient”, “economic” or “self-protection” all have a meaning that is inseparable with an agent’s terminal goals. If you just tell it to maximize paperclips then this can be realized in an infinite number of ways given imprecise design and goal parameters. Undergoing explosive recursive self-improvement, taking over the universe and filling it with paperclips, is just one outcome. Why would an arbitrary mind pulled from mind-design space care to do that? Why not just wait for paperclips to arise due to random fluctuations out of a state of chaos? That would not be irrational.

“Utility” does only become well-defined if it is precisely known what it means to maximize it. The two English words “maximize paperclips” do not define how quickly and how economically it is supposed to happen.

“Utility” has to be defined. To maximize expected utility does not imply certain actions, efficiency and economic behavior, or the drive to protect yourself. You can also rationally maximize paperclips without protecting yourself if it is not part of your goal parameters. You can also assign utility to maximize paperclips as long as nothing turns you off but don’t care about being turned off.

Further reading:

Tags: ,

Reading the Wikipedia entry on Caenorhabditis elegans and how much we already understand about this small organism and its 302 neurons makes me even more skeptical of the claim that a human-level artificial intelligence (short: AI) will be created within this century.

C. elegans

C. elegans, by Bob Goldstein

Its pattern of connectivity, or “connectome”, has been completely mapped and we have the computational capacity to simulate it. Yet nobody is able to do so. Its genome is completely sequenced.

Many different people and teams of people have been studying this little nematode for decades. Yet, as John Baez once formulated it, nobody is able to create an AI hat could navigate autonomously in a real-world environment and survive real-world threats and attacks with approximately the skill of C. elegans.

Would it be wrong to take this as evidence against human-level AI? If so, how? What makes you believe that it will be possible to create a human-level AI from scratch before it is possible to copy the skills of an already existing organism that is qualitatively and quantitatively many orders of magnitude less intelligent than humans?

Further reading:


[Click here to see a list of all interviews]

Sir William Timothy Gowers, FRS (Fellow of the Royal Society) is a British mathematician. He is a Royal Society Research Professor at the Department of Pure Mathematics and Mathematical Statistics at the University of Cambridge, where he also holds the Rouse Ball chair, and is a Fellow of Trinity College, Cambridge. In 1998 he received the Fields Medal for research connecting the fields of functional analysis and combinatorics. [Homepage]

The Interview

Timothy Gowers: OK here are my answers, but with the qualification that for some questions I’m going to restrict attention to performance at mathematics. I don’t have enough appreciation of the technical difficulties associated with more general AI to feel confident about making predictions. But I do think that if a program can do mathematical research as well as humans, then science, engineering and programming can’t be far behind (especially programming).

Q1: Assuming beneficial political and economic development and that no global catastrophe halts progress, by what year would you assign a 10%/50%/90% chance of the development of artificial intelligence that is roughly as good as humans (or better, perhaps unevenly) at science, mathematics, engineering and programming?

Timothy Gowers: I think there’s a 10% chance we’ll have programs as good as humans at doing maths within 25 years, a 50% chance that we’ll have it within 40 years and a 90% chance that we’ll have it by the end of the century.

Q2: Once we build AI that is roughly as good as humans (or better, perhaps unevenly) at science, mathematics, engineering and programming, how much more difficult will it be for humans and/or AIs to build an AI which is substantially better at those activities than humans?

Timothy Gowers: I think that once computers are as good as, say, beginning PhD students at maths, then assuming there are lots of them that have different mathematical styles and tastes and can interact with each other in the way that human mathematicians do (in principle one computer could model an entire mathematical community but I think it’s more likely that it would be done by several different programs developed by several different teams), then I think one could just leave the programs to run and you’d see maths progress like human maths but much much faster. I don’t know exactly how good they’ll have to be before this singularity arises: the key property they’ll need to have is an ability to step back, think about what they are doing, and improve themselves. That’s why I went for a beginning PhD student, who has to go through a process like that. (Maybe humans would need to act as “research supervisors” just to get them going.)

Q3: Do you ever expect artificial intelligence to overwhelmingly outperform humans at typical academic research, in the way that they may soon overwhelmingly outperform humans at trivia contests, or do you expect that humans will always play an important role in scientific progress?

Timothy Gowers: I expect computers to become overwhelmingly better than humans at mathematical research, just as they are now overwhelmingly better at number crunching. But I think that they’ll also be able to explain a lot of what they do. It’s hard to imagine people training themselves in the skill needed to solve a maths problem if this happens, but maybe it will survive to some extent, just as people still enjoy playing chess now. But I think that the human activity of mathematical research as we know it will be killed off by this development. Quite what the knock-on effects of this will be are hard to predict. But for example I think if we can build good computer researchers, we’ll also be able to build amazing interactive teaching programs, so we needn’t worry that there will be nobody left to teach mathematics.

Q4: What probability do you assign to the possibility of an AI with initially roughly professional human-level competence (or better, perhaps unevenly) at general reasoning (including science, mathematics, engineering and programming) to self-modify its way up to vastly superhuman capabilities within a matter of hours/days/< 5 years?

Timothy Gowers: I have a similar view to what I said in (2). To get to roughly  professional competence, a significant degree of self-modification will be needed, so I don’t really see how one can get to human levels without rapidly surpassing those levels.

Q5: How important is it to research risks associated with artificial intelligence that is good enough at general reasoning (including science, mathematics, engineering and programming) to be capable of radical self-modification, before attempting to build one?

Timothy Gowers: I really don’t know about this. I don’t think doing it just for maths is risky, because I think maths is sufficiently narrow that we don’t have to worry about things like whether the programs could become malign. But with more general intelligence I think it’s different. For example, if a program could pass a fairly modest Turing test, then one could build a spambot that would generate spam that was basically impossible to distinguish from non-spam. Imagine a blog that gets zillions of comments that are all perfectly sensible. Would it matter? That’s an interesting question, but it would certainly change things, and in general I think that keeping the internet going would be a serious challenge. On the plus side, one could also design better and better spam detectors, but I’m not sure how much comfort I get from that: by the definition of passing the Turing test, detection would appear to be impossible.

The general point here is that once self-modifying programs exist, people other than the original developers could use them for evil purposes. I don’t know how much of a problem that is. It applies to other things, such as nuclear weapons for instance.

Q6: What probability do you assign to the possibility of human extinction within 100 years as a result of AI capable of self-modification (that is not provably non-dangerous, if that is even possible)? P(human extinction by AI | AI capable of self-modification and not provably non-dangerous is created).

Timothy Gowers: I don’t know, but my instinct tells me that the probability is pretty small. In particular, I find it small enough that we are nowhere near the point where we should stop doing, or even slow down, research into AI.

Q7: How would you test if an artificial intelligence was at least as good as humans at mathematics?

Timothy Gowers: It would be sufficient (though maybe not necessary) to subject an artificial mathematician to the same tests that human mathematicians are subjected to. If it can write papers that attract the interest of human mathematicians, then its intelligence is as good as that of a human mathematician we judge to be producing results of a similar level of interest.

Actually, I want to qualify that. I would want my artificial mathematician not just to produce mathematics of a kind that a human might produce, but also to explain how it did so. If, for example, it made excessive use of brute-force search but ended up with the proof, when a human would get there much more efficiently, then it would be lacking something important.

Q8: Is it correct that in order to create an artificial mathematician it is first necssary to discover, prove and encode the mathematics of discovering and proving non-arbitrary mathematics (i.e. to encode a formalization of the natural language goal “be as good as humans at mathematics”)?

Timothy Gowers: I think it would be extremely helpful to formalize the notion of “interesting mathematics” (as opposed to arbitrary well-formed statements and logically valid proofs). However, again I would regard that as sufficient but not necessary: if programs were written that in practice produced mathematics of a similar nature to human mathematics, one might eventually have enough faith in them to believe that they had captured the notion of “interesting mathematics” without our having had to define it. (One could trivially define it as “something that will eventually be part of the output of the program”.)

Q9: What role does natural language proficiency play in human mathematics and what are the challenges at doing mathematics without it?

Timothy Gowers: I think that mathematicians are tempted by their training to reduce everything to small sets of assumptions, and to have very economical foundations — e.g. in set theory. But in practice we think with massively redundant sets of assumptions, and that’s important because it enables us to make connections easily that would otherwise not be obvious and would hold us up. So I think that really good automatic theorem provers will need to operate in a very high-level language — not necessarily quite as flexible as the entirety of the English language, but more like the kind of language mathematicians use when writing out a proof carefully (minus the side remarks, unless these too are quite precise).

Q10: The problems I see are the following: (1) If the formalization of what your artificial mathematician is supposed to do is very specific then most of the work requiring human-level intelligence has been done by whoever came up with that formalization and if (2) the formalization is very unspecific then it is not clear how to test for success, much less judge its efficiency.

Timothy Gowers: I don’t agree with what you say.

(1) A formalization of what constitutes interesting mathematics doesn’t have to be of the form, “This statement is interesting if and only if X.” Rather, it can be of the following form. “Let B be the current body of mathematical knowledge. A statement S is interesting relative to B if it is generated in manner X.” Then a statement is “eventually interesting relative to B” if it belongs to the closure of B under extension by interesting statements. We may well not be able to describe in advance what the statements in this closure will look like — indeed, it’s pretty certain that we won’t.

It’s true that a lot of human intelligence would be needed to come up with a formalization of that kind, but after that one could leave the computers chugging away, gradually (or not so gradually) building up from the current body of mathematical knowledge.

(2) I don’t see what’s wrong with an informal test of success: does the program produce what humans would regard as interesting results? Are there whole classes of results that it seems to be unable to discover? Etc. etc.


Below are some features of the kind of artificial general intelligence (short: AI) that people use as a model to infer that artificial general intelligence constitutes an existential risk:

  • It will want to self-improve
  • It will want to be rational
  • It will try to preserve their utility functions
  • It will try to prevent counterfeit utility
  • It will be self-protective
  • It will want to acquire resources and use them efficiently

In short, they imagine a consequentialist expected utility maximizer.

Can we say anything specific about how such an AI could work in practice? And if we are unable to approximate a practical version of such an AI, is it then sensible to use it as a model to make predictions about the behavior of practical AI’s?

A goal that is often used in such a context is <maximize paperclips>. How would an AI with the above mentioned features act given such a goal?

  • What would be its first action?
  • How long would it reason about its first action?
  • Is reasoning itself an action? If so, how long would it reason about (1) how to reason and (2) for how long to reason about reasoning…?
  • How would it deal with low probability possibilities such as (1) aliens that might try to destroy it, (2) time travel or (3) that this universe is being simulated and that the expected value of hacking the simulation does outweigh the low probability of success due to an enormous amount of resources that is conjectured to be available in the higher level universe in order to be capable of simulating this universe?

All of those questions can be answered by suggesting certain bounds and limitations. But if it is possible to limit such an AI in such a way as to make it disregard certain possibilities and to limit its planning horizon, or the expense of computational resources it uses, then how is it any harder to prevent it from causing human extinction? And if such bounds are not possible then how could it work at all? And if it does not work then how are the actions of such an AI decision relevant for humans with respect to risks associated with practical AI?

The existence of human intelligence does not support the possibility that anything resembling a consequentialist AI is practically possible:

(1) Humans are equipped by evolution with complex drives such as boredom or weariness, emotions such as fear or anger and bodily feedback such as pain and tiredness that, most of the time, save them from falling into any of the above traps that afflict expected utility maximizers.

(2) Humans do not maximize expected utility expect in a few very limited circumstances. Humans have no static utility-function and are therefore time-inconsistent.

There are certain models such as AIXI, which proves that there is a general theory of intelligence. But AIXI is as far from real world human-level general intelligence as an abstract notion of a Turing machine with an infinite tape is from a supercomputer with the computational capacity of the human brain. An abstract notion of intelligence does not get you anywhere in terms of real-world general intelligence. Just as you won’t be able to upload yourself to a non-biological substrate because you showed that in some abstract sense you can simulate every physical process.


Practically unfeasible models of artificial general intelligence are very unreliable sources to be used to reason about the behavior of eventually achievable practical versions of artificial general intelligence.

Tags: ,

How could an artificial general intelligence manage to outsmart humans? It would either have to be programmed to do so or be programmed how to learn how to do so. In both cases it would need a very specific description of what constitutes improvement towards the goal and how to judge if a goal has been achieved. In other words, it will have to know what it means to win and therefore what exactly constitutes a mistake in order to learn from its mistakes. 

Consider Mathematica, a computational software program. Mathematica works as intended. It hits the narrow target space of human volition. Mathematica is in many aspects superhuman at doing mathematics yet falls far short of replacing human mathematicians.

Mathematica is not capable of replacing human mathematicians because it is not yet possible to formalize, in sufficient detail, what it would mean to be better at mathematics than humans.

Take chess as an example of a human activity at which software is now able to beat humans. The reason is not that humans did not evolve to play chess. Humans did neither evolve to do mathematics. The difference between chess and mathematics is that chess has a specific terminal goal in the form of a clear definition of what constitutes winning. Although mathematics has unambiguous rules there is no specific terminal goal and no clear definition of what constitutes winning.

The progress of the capability of artificial intelligence is not only related to whether humans have evolved for a certain skill or to how much computational resources it requires but also to how difficult it is to formalize the skill, its rules and what it means to succeed.

If you do not know what it is that you are supposed to do then you are unable to recognize if you have improved or committed a mistake.

If your aim is to accurately model language you might start with a model of word probabilities. But world probabilities are insufficient to beat humans at language. The exceptions and subtleties of language require new probabilistic models to capture capabilities such as emotional emphasis, recognizing context and meaning. What constitutes winning is becoming increasingly complex and wide-ranging as one approaches human level capabilities. Whereas the rules and objective of chess stay constant.

Consider the goal <build a house>. What exactly would be a mistake? Would thinking about it for a trillion years be mistaken? Would creating a virtual model of a house be a mistake? Any of the infinitely many possible interpretations of <build a house> has a different subset of instrumental goals. Which means that it is not clear what exactly is a mistake as long as you do not supply a very good description of what <build a house> means and what world states would constitute improvement.

To succeed at beating humans at any activity you have to hit a very narrow target space. Once it can be formalized what it takes to beat humans at a certain activity the resulting software will do exactly what it was intended to do, namely beating humans at that activity.

The important point here is that when it comes to software behaving as intended, and therefore safely, the goal <become superhuman good at mathematics> is in no relevant respect different from the goal <build a house>. Both goals require the programmer to supply a formalized description of their intention and thereby hit the narrow target of human volition.

As I wrote in my last post, any system that would mistake a description of <build a house> or <become superhuman good at mathematics> with <kill all humans> would never be able to kill all humans because it would make similar misinterpretations when it comes to solving problems in mathematics and physics, problems that are necessary to be solved in order to kill all humans.


People who claim that artificial general intelligence is going to constitute an existential risk implicitly conjecture that whoever is going to create such an AI will know perfectly well how to formalize capabilities such as <become superhuman good at mathematics> while at the same time they will fail selectively at making it solve the mathematics they want it to solve and instead cause it to solve the mathematics that is necessary to kill all humans.

If you claim that it is possible to define the capability <become superhuman good at mathematics> then you will need a very good argument in order to support the claim that at the same time it is difficult to define goals such as <build a house> without causing human extinction.

Tags: ,

As an addendum to my last post I want to note that incorrect answers by IBM Watson are not comparable to human extinction as a side-effect or failure of an artificial general intelligence.

Watson was supposed to win at Jeopardy! and succeeded. Whereas artificial general intelligence is conjectured to work sufficiently well at not doing what it is supposed to do.

You need a lot of ingenuity to cause human extinction. Your artificial general intelligence will have to work perfectly, exactly as it was intended to work. A perfectly working machine does however not commit such mistakes as to cause human extinction in order to win at Jeopardy!, as long as it was not explicitly build to do that.

IBM Watson committed mistakes. An artificial general intelligence that is supposed to outsmart humanity has a very small margin for error. If an artificial general intelligence was prone to commit errors on the scale of confusing goals such as <win at Jeopardy!> with <kill all humans> then it would never succeed at killing all humans because it would make similar mistakes on a wide variety of problems that are necessary to solve in order to do so.

Tags: ,

It is predicted that artificial general intelligence (short: AI) does constitute an existential risk (short: risk). 

Below is a comparison chart that I believe to reflect what AI risk advocates believe about how a general artificial intelligence differs in comparison to a narrow artificial intelligence in how it will behave given the same task.


Comparison Chart: Narrow vs. General Artificial Intelligence

(According to AI risk advocates.)

Narrow artificial intelligence will be denoted NAI and general artificial intelligence GAI.

(1) Is it in principle capable of behaving in accordance with human intention to a sufficient degree?

NAI: True

GAI: True

(2) Under what circumstances does it fail to behave in accordance with human intention?

NAI: If it is broken, where broken stands for a wide range of failure modes such as incorrectly managing memory allocations.

GAI: In all cases in which it is not mathematically proven to be tasked with the protection of, and equipped with, a perfect encoding of all human values or a safe way to obtain such an encoding.

(3) What happens when it fails to behave in accordance with human intention?

NAI: It crashes, freezes or halts. It generally fails in a way that is harmful to its own functioning. If for example an autonomous car fails at driving autonomously it usually means that it will either go into safe-mode and halt or crash.

GAI: It works perfectly well. Superhumanly well. All its intended capabilities are intact except that it completely fails at working as intended in such a way as to destroy all human value in the universe. It will be able to improve itself and capable of obtaining a perfect encoding of human values. It will use those intended capabilities in order to deceive and overpower humans rather than doing what it was intended to do.

(4) What happens if it is bound to use a limited amount of resources, use a limited amount of space or run for a limited amount of time?

NAI: It will only ever do what it was programmed to do. As long as there is no fatal flaw, harming its general functionality, it will work within the defined boundaries as intended.

GAI: It will never do what it was programmed to do and always remove or bypass its intended limitations in order to pursue unintended actions such as taking over the universe.


The current beliefs of most experts in the field of AI do not seem to support that the behavior outlined in the chart above is a likely outcome. See for example Peter Norvig, 2012:

Personally, I think that the last invention we need ever make is the partnership of human and tool. Paralleling the move from mainframe computers in the 1970s to personal computers today, most AI systems went from being standalone entities to being tools that are used in a human-machine partnership.

Our tools will get ever better as they embody more intelligence. And we will become better as well, able to access ever more information and education. We may hear less about AI and more about IA, that is to say “intelligence amplification”. In movies we will still have to worry about the machines taking over, but in real life humans and their sophisticated tools will move forward together.

I believe that AI risk advocates need to provide a lot of technical details and specific arguments to support the above chart. Arguing by definition alone is insufficient. The behavior outlined above has to be shown not only to be in principle possible but to be a probable result of actual research and development.

What is it that makes a general intelligence, as opposed to a narrow intelligence, behave in such a way as to result in human extinction?

What can be said about a general intelligence that can’t be said about a narrow intelligence such as IBM Watson? Both systems can be interpreted, implicitly, to have a utility function. And even a thermostat could be interpreted to have a terminal goal. Yet a narrow intelligence, an expert system, is characterized to achieve its goal while a generally intelligent agent is characterized to achieve its goal and in addition pursue activities that will cause human extinction.

Further reading:

Tags: ,

« Older entries § Newer entries »