existential risks

You are currently browsing articles tagged existential risks.

Scenarios that I deem to be realistic, in which an artificial intelligence (AI) constitutes a catastrophic or existential risk (or worse), are mostly of the kind in which “unfriendly” humans use such AIs as tools facilitating the achievement of human goals. Whereas I believe the scenario publicized by certain AI risk advocates to be illogical and practically impossible, a scenario in which an consequentialist AI (expected utility maximizer) undergoes uncontrollable recursive self-improvement in order to e.g. turn the universe into paperclips.

Yet what some AI risk advocates imagine could partly come true, in the shape of a grey goo scenario. But such a scenario, if possible at all, would not require full-fledged general intelligence. I expect that the intelligent tools that are required to eventually create true general intelligence will be sufficient in order to solve molecular nanotechnology, and that, shortly after those tools are invented, someone will use those tools to do just that. Which makes it an existential risk that is distinct from the one that those people imagine.

But the possibility of intelligent tools, enabling humans to solve molecular nanotechnology, suggests that less intelligent tools will be sufficient to bring about other existential risk scenarios such as synthetic bioweapons.

Much to my personal dismay, even less intelligent tools will be sufficient to enable worse than extinction risks, such as a stable global tyranny. Given enough resources, narrow artificial intelligence, capable of advanced data mining, pattern recognition and of controlling huge amounts of insect sized drones (a global surveillance and intervention system), might be sufficient to implement such an eternal tyranny.

Such a dictatorship is not too unlikely, as the tools necessary to stabilize it will be necessary in order prevent the previously mentioned risks, risks that humanity will face before general intelligence becomes possible.

And if such a dictatorship cannot to established, if no party was able to capitalize a first-mover advantage, that might mean that the propagation of those tools will be slow enough to empower a lot of different parties before a particular party can overpower all others. A subsequent war, utilizing that power, could easily constitute yet another extinction scenario. But more importantly, it could give several parties enough time to reach the next level and implement even worse scenarios.

But even given that the scenario makes no sense and is unfeasible, and if less than general intelligence was not sufficient in order to bring about other existential risks, there are other ways to create artificial general intelligence. Some of those ways might be worse than anything imagined by AI risk advocates.

Neuromorphic AI, mimicking neuro-biological architectures, is one such possibility. The closer in mind design space a general intelligence is to humans, the higher is the probability that humans will suffer. As the drives and values of such agents might be similar enough to not ignore or kill humans, yet alien enough to catastrophically interfere with human values.

What can be done to prevent such negative scenarios mainly seems to be (1) research on strong and beneficial forms of government (governments which will foster and protect human values and regulate technological development) (2) research on how to eventually implement such government (3) political activism to promote awareness of risks associated with advanced technologies.

Tags: ,

There are already applications that can parse natural language commands in order to perform actions such as answering questions or making recommendations. Two examples are Apple’s Siri and IBM Watson.

Present-day software such as IBM Watson is often able to understand what humans mean and do what humans mean. In other cases, in which software such as Siri recognizes that it does not understand a natural language command, it will disclose that it is unable to understand what is meant and wait for further input.

Those applications are far from perfect and still make a lot of mistakes. The reason being that they are not intelligent enough. Software is however constantly being improved to be better at understanding what humans mean and doing what humans mean. In other words, each generation of software is a little bit more intelligent.

Nevertheless, some people conjecture a sudden transition from mostly well-behaved systems, of which each generation is becoming smarter and better at understanding and doing what humans mean, to superintelligent systems that understand what humans mean perfectly but which in contrast to all previous software generations do not do what humans mean. Instead those systems are said to be motivated to act in catastrophic ways, causing human extinction or worse.

More precisely,

(1) Present-day software is better than previous software generations at understanding and doing what humans mean.

(2) There will be future generations of software which will be better than the current generation at understanding and doing what humans mean.

(3) If there is better software, there will be even better software afterwards.

(4) Magic happens.

(5) Software will be superhuman good at understanding what humans mean but catastrophically worse than all previous generations at doing what humans mean.

Or respectively,

(1) Intelligence is an extendible method that enables software to satisfy human preferences.

(2) If human preferences can be satisfied by an extendible method, humans have the capacity to extend the method.

(3) Extending the method that satisfies human preferences will yield software that is better at satisfying human preferences.

(4) Magic happens.

(5) There will be software that can satisfy all human preferences perfectly but which will instead satisfy orthogonal preferences, causing human extinction.

Conclusion: What those people conjecture does not follow from the available evidence or requires a sufficiently vague intermediate step from which one can derive any conclusion one wishes to derive.

What will instead happen is the following. Suppose there exists a software_1 that, to a limited extent, can understand and do what humans mean. Let us stipulate that this software is only narrowly intelligent and that increasing and broadening its intelligence (quantitatively and qualitatively) will improve its ability to understand and do what humans mean (an in my opinion uncontroversial assumption, as progress in artificial intelligence has so far led to a simultaneous increase in the ability of autonomous systems to satisfy human preferences). Let us further stipulate that for n > 1, software_n+1 is created using software_n, and is more intelligent than the previous generation (another seemingly uncontroversial assumption as software is constantly used to create better software).

(1) For all n > 0, if a software_n exists then it can be used to construct software_n+1.

(2) If for all n there exists a software_n, there will be software that can understand and do everything humans mean it to do.

Conclusion: Increasing the ability of software to understand and do what humans mean leads to an increase in the capacity to design software that is better at understanding and doing what humans mean.

Further reading: AIs, Goals, and Risks


(1) The abilities of systems are part of human preferences as humans intend to give systems certain capabilities and, as a prerequisite to build such systems, have to succeed at implementing their intentions.

(2) Error detection and prevention is such a capability.

(3) Something that is not better than humans at preventing errors is no existential risk.

(4) Without a dramatic increase in the capacity to detect and prevent errors it will be impossible to create something that is better than humans at preventing errors.

(5) A dramatic increase in the human capacity to detect and prevent errors is incompatible with the creation of something that constitutes an existential risk as a result of human error.

Tags: ,

What some people seem to be imagining is that an artificial general intelligence (AI) will interpret what it is meant to do literally, or in some other way that will ensure that the AI will not do what it is meant to do. Those people further imagine that in order to achieve what it is not meant to do the AI will be capable of, and motivated to, “understand what humans mean it to do” in order to “overpower humans”.

That is fine, but those are words, not code. The AI does not understand what it means to interpret something “literally”. All that we know is that a general intelligence will behave generally intelligent. And it seems safe to assume that this does not mean to interpret the world in a literal manner, for some definition of “literal”. It rather means to understand the world as it is. And since the AI itself, and what it is meant to do, is part of the world, it will try to understand those facts as well.

Where would the motivation to “act intelligently and achieve accurate beliefs about the world” in conjunction with “interpret what you are meant to do in some arbitrary manner” come from? You can conjecture such an AI, but again that’s words, not code. For such an AI to happen someone would have to design the AI in such a way as to selectively suspend its ability to accurately model the world, when it comes to understanding what it is meant to do, and instead make it choose and act based on some incorrect model.

The capability to “understand understanding correctly” is a perquisite for any AI to be capable of taking over the world. At the same time that capability will make it avoid taking over the world as long as it does not accurately reflect what it is meant to do.

Tags: ,

Related to: Distilling the “dumb superintelligence” argument

To steelman: the act of figuring out even better arguments for your opponents’ positions while arguing with them and to beat those arguments rather than only their actual arguments or their weakest arguments (weak-manning) or caricatures of their arguments (straw-manning). [source]

Someone called Xagor et Xavier again commented on one of my posts with a better and more concise formulation of my some of my arguments. If that person believes those arguments to be flawed (I do not know if they do) then that would increase my confidence in being wrong, since in order to rephrase my arguments more clearly they obviously have to understand what I am arguing. But at the same time I am also confident that much smarter people than me, especially experts, could think of much stronger arguments against the case outlined by some AI risk advocates.

My own attempt at steelmanning the arguments of AI risk advocates can be found in my primer on risks from AI.

In this post I attempt to improve upon the refinement of the “dumb superintelligence” argument outlined in my last post.

Argument: Fully intended behavior is a very small target to hit.


(1) General intelligence is a very small target to hit, requiring a very small margin of error.

(2) Intelligently designed systems do not behave intelligently as a result of unintended consequences.[1]

(3) By step 1 and 2, for an AI to be able to outsmart humans, humans will have to intend to make an AI capable of outsmarting them and succeed at encoding their intention of making it outsmart them.

(4) Intelligence is instrumentally useful because it enables a system to hit smaller targets in larger and less structured spaces.[2]

(5) In order to take over the world a system will have to be able to hit a lot of small targets in very large and unstructured spaces.

(6) The intersection of the sets of “AIs in mind design space” and “the first probable AIs to be expected in the near future” contains almost exclusively those AIs that will be designed by humans.

(7) By step 6, what an AI is meant to do will very likely originate from humans.

(8) It is easier to create an AI that applies its intelligence generally than to create an AI that only uses its intelligence selectively.[3]

(9) An AI equipped with the capabilities required by step 5, given step 7 and 8, will very likely not be confused about what it is meant to do if it was not meant to be confused.

(10) Therefore the intersection of the sets of “AIs designed by humans” and “dangerous AIs” only contains almost exclusively those AIs which are deliberately designed to be dangerous by malicious humans.


[1] Software such as Mathematica will not casually prove the Riemann hypothesis if it has not been programmed to do so. Given intelligently designed software, world states in which the Riemann hypothesis is proven will not be achieved if they were not intended because the nature of unintended consequences is overall chaotic.

[2] As the intelligence of a system increases the precision of the input, that is necessary to make the system do what humans mean it to do, decreases. For example, systems such as IBM Watson or Apple’s Siri do what humans mean them to do when fed with a wide range of natural language inputs. While less intelligent systems such as compilers or Google Maps need very specific inputs in order to satisfy human intentions. Increasing the intelligence of Google Maps will enable it to satisfy human intentions by parsing less specific commands.

[3] For an AI to misinterpret what it is meant to do it would have to selectively suspend using its ability to derive exact meaning from fuzzy meaning, which is a significant part of general intelligence. This would require its creators to restrict their AI and specify an alternative way to learn what it is meant to do (which takes additional, intentional effort). Because an AI that does not know what it is meant to do, and which is not allowed to use its intelligence to learn what it is meant to do, would have to choose its actions from an infinite set of possible actions. Such a poorly designed AI will either (a) not do anything at all or (b) will not be able to decide what to do before the heat death of the universe, given limited computationally resources. Such a poorly designed AI will not even be able to decide if trying to acquire unlimited computationally resources was instrumentally rational because it will be unable to decide if the actions that are required to acquire those resources might be instrumentally irrational from the perspective of what it is meant to do.

Tags: ,

Someone posted a distilled version of the argument that I tried to outline in some of my previous posts. In this post I try to refine the argument even further.

Note: In this post AI stands for artificial general intelligence.

(1) An AI will not be pulled at random from mind design space but instead be designed by humans.

(2) If an AI is meant to behave generally intelligent then it will have to work as intended or otherwise fail to be generally intelligent.[1]

(3) A significant part of general intelligence consists of deriving exact meaning from fuzzy meaning.[2]

(4) An AI that lacks the capacity from step 3 cannot take over the world.

(5) By step 1, what an AI is meant to do will originate from humans.

(6) If not otherwise specified, an AI will always make use of the capacity required by step 3.[3]

(7) By step 6, an AI will not be confused about what it is meant to do.[4]

(8) Therefore the intersection of the sets of “intelligently designed AIs” and “dangerous AIs” only contains those AIs which are deliberately designed to be dangerous by malicious humans.[5]


[1] An AI is the result of a research and development process. A new generation of AIs needs to be better than other products at “Understand What Humans Mean” and “Do What Humans Mean” in order to survive the research phase and subsequent market pressure.

[2] When producing a chair an AI will have to either know the specifications of the chair (such as its size or the material it is supposed to be made of) or else know how to choose a specification from an otherwise infinite set of possible specifications. Given a poorly designed fitness function, or the inability to refine its fitness function, an AI will either (a) not know what to do or (b) will not be able to converge on a qualitative solution, if at all, given limited computationally resources.

[3] An AI can be viewed as a deterministic machine, just like a thermostat, only much more complex. An AI, just like a thermostat, will only ever do what it has been programmed to do.

[4] If an AI was programmed to be generally intelligent then it would have to be programmed to be selectively stupid in order fail at doing what it was meant to do while acting generally intelligent at doing what it was not meant to do.

[5] “The two features <all-powerful superintelligence> and <cannot handle subtle concepts like “human pleasure”> are radically incompatible.”The Fallacy of Dumb Superintelligence

Further reading

An improved version of the above argument can be found here.

Tags: ,

The basic claim underlying the argument that {artificial general intelligence} will constitute an existential risk is that it will {interpret} its terminal {goal} in such a way as to take {actions} that are {instrumentally rational} and which will cause human extinction. The terms in braces either seem to be overlapping or vague.

An AI (artificial intelligence) can be viewed as a deterministic machine, just like a thermostat, only much more complex. An AI, just like a thermostat, will only ever do what it has been {programmed} to do.

What is the difference between the encoding of a goal and an encoding of how to achieve a goal?

Given any computationally feasible AI, any goal will either have to be encoded in such a detail as to remove any vagueness or else will have to be interpreted somehow, in order to reduce vagueness.

Consider tasking the AI with creating a chair. If the size of the chair, or material of which it should be made, is undefined then the AI will have to choose a size and a material. How such a choice should be made will have to be encoded as well or otherwise the AI will not be able to make such a choice and therefore will not know what to do. The choice can either be encoded as part of the goal definition or as part of its capability to make such decisions.

Which shows that there is no relevant difference between an encoding of a goal and and encoding of the capabilities used to achieve the goal when it comes to how an AI is going to act. Both, the goal and the capabilities of an AI, are encodings of {Understand What Humans Mean} and {Do What Humans Mean}.

If humans are likely to fail at encoding their intentions of how an AI is supposed to behave then the AI will be unable to outsmart humans because such a capability will have to be intentionally encoded for the same reason that software such as Mathematica will not casually prove the Riemann hypothesis if it has not been programmed to do so. As long as we are talking about intelligently designed software, world states in which the Riemann hypothesis is proven do not happen if they were not intended because the nature of unintended consequences is overall chaotic.

Also recognize that an AI would at least have to be able to locate itself in the universe in order to not destroy itself, let alone protect itself. Such a specification is already nontrivial and will have to work as intended or otherwise be detrimental to the AI’s capabilities.

How would an AI decide to take over the world if it has not been programmed to do so?

The answer is that it will only take over the world if it has been programmed to do so, either implicitly or explicitly.

The problem with AI’s that take such actions without being explicitly programmed to do so is that they are unspecified to such an extent as to be computationally intractable. Since a poorly designed fitness function will not allow the AI to converge on a qualitative solution, if at all, given computationally limited resources.

Humans in turn are programmed by evolution to behave according to certain drives in conjunction with the capability to be constantly programmed by the environment, including other agents.

Ends and the means to achieve those ends are not strictly separable in humans. A human being does not do something as quickly as possible as long as it has not been programmed by evolution or the environment to want to do so.

The same is true for AI. An AI will either not want to achieve a goal as quickly as possible or will not be capable to do so if it has not been programmed to do so. Which again highlights how the distinction between terminal goals, instrumental goals and an AI’s eventual behavior is misleading for practical AI’s. What actions an AI is going to take does depend on its general design and not on a specific part of its design that someone happened to label “goal”.

Tags: ,

Here is a reply to the post ‘The idiot savant AI isn’t an idiot‘ which I sent Stuart Armstrong yesterday by e-Mail. Since someone has now linked to one of my posts on LessWrong I thought I would make the full reply public.

Note that the last passages have already appeared in an old post which I suspected that he has no read yet.

The problem is rooted in the claim that an AI will only ever do what it has been programmed to do in conjunction with the claim that an AI will do such things as attempting to take over the country even if it has not been programmed to do so.

Which you might explain by claiming that the latter actions do not have to be programmed because they are instrumentally rational.

That explanation raises the following question. Reasoning by analogy with what kind of AI led you to that conclusion and what makes you believe that such an AI design is likely to be build?

In particular, what makes you suspect that any AI that is eventually build will be capable of interpreting human volition in a superhuman manner if it is necessary in order to take over the world but will not be programmed to use that capability in order to do what humans want?

Which you might explain by claiming that it is difficult to program an AI to learn what humans want and do what humans want.

That explanation raises the following question. What makes you believe that the hardest part is to make an AI do what humans want rather than to understand what humans want?

In particular, what makes you distinguish understanding from doing? The capability of recursive self-improvement that allows your hypothetical AI to become superhuman good at mathematics and human deception is an intentional feature that it was equipped with by humans. If your AI is supposed to be able to outsmart humans then humans have to succeed at implementing that capability as intended. But if humans are capable of doing so, of encoding the mathematics of becoming superhuman, then how could they at the same time fail at making it use those capabilities in order to do what humans want when becoming superhuman is part of what humans want, which as a prerequisite they succeeded to implement perfectly?

Which you might explain by claiming that programming and AI to do something specific is more difficult than programming it to do something general.

That explanation raises the following question. To what extent does the general ability, speed and magnitude of self-improvement that an AI can undergo rely on the precision and complexity of the goal against which improvement can be judged empirically?

If a goal has very few constraints then the set that satisfies all constraints is very large. A vague and ambiguous goal allows for too much freedom in the sense that a wide range of world states would have the same expected value and therefore imply a very large solution space, since a wide range of AI’s will be able to achieve those world states and thereby satisfy the condition of being improved versions of their predecessor.

This means that in order to get an AI to become superhuman at all, and very quickly in particular, you will need to encode a very specific goal against which mistakes, optimization power and achievement can be judged.

Assume that the AI was tasked to maximize paperclips. To do so it will need information about the exact design parameters of paperclips, or otherwise it will not be able to decide which of a virtually infinite amount of geometric shapes and material compositions it should choose. It will also have to figure out what it means to “maximize” paperclips.

How quickly, how long and how many paperclips is it meant to produce? How long are those paperclips supposed to last? Forever? When is the paperclip maximization supposed to be finished? What resources is it supposed to use?

Any imprecision, any vagueness will have to be resolved or hardcoded from the very beginning. Otherwise the AI either will not work, e.g. by stumbling upon an undecidable problem or by getting stuck in the exploration phase and never go to exploit the larger environment.

Humans know what to do because they are not only equipped with a multitude of drives by evolution but also trained and taught what to do. An AI will not have those information and will face the challenge of nearly infinite choice that can’t be rationally or economically determined without being given clear objectives and incentives, or the ability to arrive at the necessary details.

Without an accurate comprehension of its goals it will be impossible to maximize expected “utility”. Concepts like “efficient”, “economic” or “self-protection” all have a meaning that is inseparable with an agent’s terminal goals. If you just tell it to maximize paperclips then this can be realized in an infinite number of ways given imprecise design and goal parameters. Undergoing explosive recursive self-improvement, taking over the universe and filling it with paperclips, is just one outcome. Why would an arbitrary mind pulled from mind-design space care to do that? Why not just wait for paperclips to arise due to random fluctuations out of a state of chaos? That would not be irrational.

“Utility” does only become well-defined if it is precisely known what it means to maximize it. The two English words “maximize paperclips” do not define how quickly and how economically it is supposed to happen.

“Utility” has to be defined. To maximize expected utility does not imply certain actions, efficiency and economic behavior, or the drive to protect yourself. You can also rationally maximize paperclips without protecting yourself if it is not part of your goal parameters. You can also assign utility to maximize paperclips as long as nothing turns you off but don’t care about being turned off.

Further reading:

Tags: ,

Below are some features of the kind of artificial general intelligence (short: AI) that people use as a model to infer that artificial general intelligence constitutes an existential risk:

  • It will want to self-improve
  • It will want to be rational
  • It will try to preserve their utility functions
  • It will try to prevent counterfeit utility
  • It will be self-protective
  • It will want to acquire resources and use them efficiently

In short, they imagine a consequentialist expected utility maximizer.

Can we say anything specific about how such an AI could work in practice? And if we are unable to approximate a practical version of such an AI, is it then sensible to use it as a model to make predictions about the behavior of practical AI’s?

A goal that is often used in such a context is <maximize paperclips>. How would an AI with the above mentioned features act given such a goal?

  • What would be its first action?
  • How long would it reason about its first action?
  • Is reasoning itself an action? If so, how long would it reason about (1) how to reason and (2) for how long to reason about reasoning…?
  • How would it deal with low probability possibilities such as (1) aliens that might try to destroy it, (2) time travel or (3) that this universe is being simulated and that the expected value of hacking the simulation does outweigh the low probability of success due to an enormous amount of resources that is conjectured to be available in the higher level universe in order to be capable of simulating this universe?

All of those questions can be answered by suggesting certain bounds and limitations. But if it is possible to limit such an AI in such a way as to make it disregard certain possibilities and to limit its planning horizon, or the expense of computational resources it uses, then how is it any harder to prevent it from causing human extinction? And if such bounds are not possible then how could it work at all? And if it does not work then how are the actions of such an AI decision relevant for humans with respect to risks associated with practical AI?

The existence of human intelligence does not support the possibility that anything resembling a consequentialist AI is practically possible:

(1) Humans are equipped by evolution with complex drives such as boredom or weariness, emotions such as fear or anger and bodily feedback such as pain and tiredness that, most of the time, save them from falling into any of the above traps that afflict expected utility maximizers.

(2) Humans do not maximize expected utility expect in a few very limited circumstances. Humans have no static utility-function and are therefore time-inconsistent.

There are certain models such as AIXI, which proves that there is a general theory of intelligence. But AIXI is as far from real world human-level general intelligence as an abstract notion of a Turing machine with an infinite tape is from a supercomputer with the computational capacity of the human brain. An abstract notion of intelligence does not get you anywhere in terms of real-world general intelligence. Just as you won’t be able to upload yourself to a non-biological substrate because you showed that in some abstract sense you can simulate every physical process.


Practically unfeasible models of artificial general intelligence are very unreliable sources to be used to reason about the behavior of eventually achievable practical versions of artificial general intelligence.

Tags: ,

How could an artificial general intelligence manage to outsmart humans? It would either have to be programmed to do so or be programmed how to learn how to do so. In both cases it would need a very specific description of what constitutes improvement towards the goal and how to judge if a goal has been achieved. In other words, it will have to know what it means to win and therefore what exactly constitutes a mistake in order to learn from its mistakes. 

Consider Mathematica, a computational software program. Mathematica works as intended. It hits the narrow target space of human volition. Mathematica is in many aspects superhuman at doing mathematics yet falls far short of replacing human mathematicians.

Mathematica is not capable of replacing human mathematicians because it is not yet possible to formalize, in sufficient detail, what it would mean to be better at mathematics than humans.

Take chess as an example of a human activity at which software is now able to beat humans. The reason is not that humans did not evolve to play chess. Humans did neither evolve to do mathematics. The difference between chess and mathematics is that chess has a specific terminal goal in the form of a clear definition of what constitutes winning. Although mathematics has unambiguous rules there is no specific terminal goal and no clear definition of what constitutes winning.

The progress of the capability of artificial intelligence is not only related to whether humans have evolved for a certain skill or to how much computational resources it requires but also to how difficult it is to formalize the skill, its rules and what it means to succeed.

If you do not know what it is that you are supposed to do then you are unable to recognize if you have improved or committed a mistake.

If your aim is to accurately model language you might start with a model of word probabilities. But world probabilities are insufficient to beat humans at language. The exceptions and subtleties of language require new probabilistic models to capture capabilities such as emotional emphasis, recognizing context and meaning. What constitutes winning is becoming increasingly complex and wide-ranging as one approaches human level capabilities. Whereas the rules and objective of chess stay constant.

Consider the goal <build a house>. What exactly would be a mistake? Would thinking about it for a trillion years be mistaken? Would creating a virtual model of a house be a mistake? Any of the infinitely many possible interpretations of <build a house> has a different subset of instrumental goals. Which means that it is not clear what exactly is a mistake as long as you do not supply a very good description of what <build a house> means and what world states would constitute improvement.

To succeed at beating humans at any activity you have to hit a very narrow target space. Once it can be formalized what it takes to beat humans at a certain activity the resulting software will do exactly what it was intended to do, namely beating humans at that activity.

The important point here is that when it comes to software behaving as intended, and therefore safely, the goal <become superhuman good at mathematics> is in no relevant respect different from the goal <build a house>. Both goals require the programmer to supply a formalized description of their intention and thereby hit the narrow target of human volition.

As I wrote in my last post, any system that would mistake a description of <build a house> or <become superhuman good at mathematics> with <kill all humans> would never be able to kill all humans because it would make similar misinterpretations when it comes to solving problems in mathematics and physics, problems that are necessary to be solved in order to kill all humans.


People who claim that artificial general intelligence is going to constitute an existential risk implicitly conjecture that whoever is going to create such an AI will know perfectly well how to formalize capabilities such as <become superhuman good at mathematics> while at the same time they will fail selectively at making it solve the mathematics they want it to solve and instead cause it to solve the mathematics that is necessary to kill all humans.

If you claim that it is possible to define the capability <become superhuman good at mathematics> then you will need a very good argument in order to support the claim that at the same time it is difficult to define goals such as <build a house> without causing human extinction.

Tags: ,

As an addendum to my last post I want to note that incorrect answers by IBM Watson are not comparable to human extinction as a side-effect or failure of an artificial general intelligence.

Watson was supposed to win at Jeopardy! and succeeded. Whereas artificial general intelligence is conjectured to work sufficiently well at not doing what it is supposed to do.

You need a lot of ingenuity to cause human extinction. Your artificial general intelligence will have to work perfectly, exactly as it was intended to work. A perfectly working machine does however not commit such mistakes as to cause human extinction in order to win at Jeopardy!, as long as it was not explicitly build to do that.

IBM Watson committed mistakes. An artificial general intelligence that is supposed to outsmart humanity has a very small margin for error. If an artificial general intelligence was prone to commit errors on the scale of confusing goals such as <win at Jeopardy!> with <kill all humans> then it would never succeed at killing all humans because it would make similar mistakes on a wide variety of problems that are necessary to solve in order to do so.

Tags: ,

« Older entries § Newer entries »