existential risks

You are currently browsing articles tagged existential risks.

It is predicted that artificial general intelligence (short: AI) does constitute an existential risk (short: risk). 

Below is a comparison chart that I believe to reflect what AI risk advocates believe about how a general artificial intelligence differs in comparison to a narrow artificial intelligence in how it will behave given the same task.


Comparison Chart: Narrow vs. General Artificial Intelligence

(According to AI risk advocates.)

Narrow artificial intelligence will be denoted NAI and general artificial intelligence GAI.

(1) Is it in principle capable of behaving in accordance with human intention to a sufficient degree?

NAI: True

GAI: True

(2) Under what circumstances does it fail to behave in accordance with human intention?

NAI: If it is broken, where broken stands for a wide range of failure modes such as incorrectly managing memory allocations.

GAI: In all cases in which it is not mathematically proven to be tasked with the protection of, and equipped with, a perfect encoding of all human values or a safe way to obtain such an encoding.

(3) What happens when it fails to behave in accordance with human intention?

NAI: It crashes, freezes or halts. It generally fails in a way that is harmful to its own functioning. If for example an autonomous car fails at driving autonomously it usually means that it will either go into safe-mode and halt or crash.

GAI: It works perfectly well. Superhumanly well. All its intended capabilities are intact except that it completely fails at working as intended in such a way as to destroy all human value in the universe. It will be able to improve itself and capable of obtaining a perfect encoding of human values. It will use those intended capabilities in order to deceive and overpower humans rather than doing what it was intended to do.

(4) What happens if it is bound to use a limited amount of resources, use a limited amount of space or run for a limited amount of time?

NAI: It will only ever do what it was programmed to do. As long as there is no fatal flaw, harming its general functionality, it will work within the defined boundaries as intended.

GAI: It will never do what it was programmed to do and always remove or bypass its intended limitations in order to pursue unintended actions such as taking over the universe.


The current beliefs of most experts in the field of AI do not seem to support that the behavior outlined in the chart above is a likely outcome. See for example Peter Norvig, 2012:

Personally, I think that the last invention we need ever make is the partnership of human and tool. Paralleling the move from mainframe computers in the 1970s to personal computers today, most AI systems went from being standalone entities to being tools that are used in a human-machine partnership.

Our tools will get ever better as they embody more intelligence. And we will become better as well, able to access ever more information and education. We may hear less about AI and more about IA, that is to say “intelligence amplification”. In movies we will still have to worry about the machines taking over, but in real life humans and their sophisticated tools will move forward together.

I believe that AI risk advocates need to provide a lot of technical details and specific arguments to support the above chart. Arguing by definition alone is insufficient. The behavior outlined above has to be shown not only to be in principle possible but to be a probable result of actual research and development.

What is it that makes a general intelligence, as opposed to a narrow intelligence, behave in such a way as to result in human extinction?

What can be said about a general intelligence that can’t be said about a narrow intelligence such as IBM Watson? Both systems can be interpreted, implicitly, to have a utility function. And even a thermostat could be interpreted to have a terminal goal. Yet a narrow intelligence, an expert system, is characterized to achieve its goal while a generally intelligent agent is characterized to achieve its goal and in addition pursue activities that will cause human extinction.

Further reading:

Tags: ,

Related to: AI drives vs. practical research and the lack of specific decision procedures

It has recently been argued that criticisms have failed to puncture the arguments in favor of artificial general intelligence being an existential risk (short: AI risks). I vehemently disagree with this assessment and further claim that the arguments in favor of AI risks have so far been thoroughly unconvincing.

It also seems not to be just me who is unconvinced by the existing arguments in favor of AI risks. How many experts in a field related to AI have been convinced by AI risk arguments? And if none of the relevant experts have ever been exposed to those arguments then I have to conclude that the arguments have either deliberately or carelessly been shielded from real world feedback loops.

Below I list four, not necessarily independent, caveats against AI risks that would be valid even if one was to accept (1) that AI will be invented soon enough to be decision relevant at this point in time (2) that the kind of uncontrollable recursive self-improvement imagined by AI risk advocates was even in principle possible (3) that the advantage of greater intelligence scales with the task of taking over the world in such a way that it becomes probable that an AI will succeed in doing so even given the lack of concrete scenarios on how that is supposed to happen.

(1) An AI is not pulled at random from mind design space. An AI is the result of a research and development process. A new generation of AI’s needs to be better than other products at “Understand What Humans Mean” and “Do What Humans Mean” in order to survive the research phase and subsequent market pressure.

(2) An AI will only ever do what it has been explicitly programmed to do. An AI is not going to protect its utility-function, acquire resources or preemptively eliminate obstacles in an unbounded fashion. Because it is not intrinsically rational to do so. What specifically constitutes rational, economic behavior is inseparable with an agent’s terminal goal. That any terminal goal can be realized in an infinite number of ways implies an infinite number of instrumental goals to choose from.

(3) Commercial, research or military products are created with efficiency in mind. An AI that was prone to take unbounded actions given any terminal goal would either be fixed or abandoned during the early stages of research. If early stages showed that inputs such as the natural language query <What would you do if I asked you to minimize human suffering?> would yield results such as <I will kill all humans.> then the AI would never reach a stage in which it was sufficiently clever and trained to understand what results would satisfy its creators in order to deceive them.

(4) Unintended consequences are by definition not intended. They are not intelligently designed but detrimental side effects, failures. Whereas intended consequences, e.g. in the case of an artificial intelligence, such as acting intelligently, are intelligently designed. If software was not constantly improved to be better at doing what humans intend it to do we would never be able to reach a level of sophistication where a software could work well enough to outsmart us. To do so it would have to work as intended along a huge number of dimensions. For an AI to constitute a risk as a result of unintended consequences those unintended consequences would have to have no, or little, negative influence on the huge number of intended consequences that are necessary for it to be able to overpower humanity.

For more arguments and many more caveats, including elaborations of the arguments above, see here.

Tags: ,

Related to: AI vs. humanity and the lack of concrete scenariosQuestions regarding the nanotechnology-AI-risk conjunctionAI risk scenario: Deceptive long-term replacement of the human workforceAI drives vs. practical research and the lack of specific decision proceduresAI risk scenario: Elite CabalAI risk scenario: Social engineeringAI risk scenario: Insect-sized drones;

Objective: Remarks and questions about the use of biological toxins or infectious agents by an artificial general intelligence (short: AI) to decisively weaken and eventually overpower humanity.

Remarks: In comparison to other scenarios (see the related links above), it seems difficult to determine the culprit behind such an attack. That difficulty could allow the AI to be safe from human counter-strikes.

The use of biological weapons does further not rely on highly speculative or unspecified technological magic.


(1.0) At the time of the first AI, how dependent will the global infrastructure be on a functioning human society and how dependent will the AI be on that infrastructure to eventually achieve whatever terminal goal it might have?

(2.0) Given that an AI is not infrastructure-dependent, what is the expected utility, given the instrumental goal of taking over the world, of trying to design biological weapons?

(2.1) How likely is an AI to be able to acquire the necessary infrastructure, design and release a suitable biological weapon without being caught? Consider that the danger of such weapons is already widely known and taken seriously.

(2.2) How likely is biological warfare to weaken humanity to such an extent that a thorough investigation of the causes and the ultimate revelation and termination of the culprit will be rendered impossible?

(2.3) At the time of the first AI, what will be the state of biodefense, how likely are biological weapons to remain undetected for a sufficient amount of time and how likely is their neutralization to pose a major difficulty?

Tags: ,

Related to: AI vs. humanity and the lack of concrete scenariosQuestions regarding the nanotechnology-AI-risk conjunctionAI risk scenario: Deceptive long-term replacement of the human workforceAI drives vs. practical research and the lack of specific decision proceduresAI risk scenario: Elite CabalAI risk scenario: Social engineering;

Objective: Some remarks and questions about a scenario outlined by Tyler Cowen in which insect-sized drones are used to kill people or to carry out terror attacks.

The scenario:

Not bee drones, rather drone drones, with military and terrorist capabilities.  There is already a (foiled) terror plot using model airplanes.  How easy would it be to stop a mechanical “bee” which injects a human target with rapidly-acting poison?


From the point of view of my arguably naive layman perspective such a scenario seems scary and not too unrealistic even in the absence of a rogue superhuman artificial general intelligence (short: AI) trying to overpower humanity.

The scenario does not seem to require advanced molecular nanotechnology or generally rely on any kind of far-fetched or unspecified technological magic that an intelligence greater than that of humans might invent.

An AI would be capable of controlling a huge number of such drones in a goal-directed manner, either remotely or by implementing autonomous proxies of itself. Those drones could then be used to physically manipulate the environment or to possibly create wireless networks that are independent of the global infrastructure.


(1) Could a huge number of such drones eventually overpower humanity and what number would be sufficient to accomplish that goal?

(2) How would an AI manage to unsuspiciously produce a huge number of such drones or how likely is such a number of drones to already be available to the AI and suitable for the purpose of taking over the world?

(3) How quickly could an AI overpower humanity using such drones before humans could intervene by e.g. the use of electromagnetic pulses to disable the drones.

(4) How likely is such an AI to exist in a time before wide-ranging security measures against macro-drones have been implemented due to their previous use by governments and or terrorists?

Tags: ,

Related to: AI vs. humanity and the lack of concrete scenariosQuestions regarding the nanotechnology-AI-risk conjunction, AI risk scenario: Deceptive long-term replacement of the human workforceAI drives vs. practical research and the lack of specific decision proceduresAI risk scenario: Elite Cabal;

Objective: Some remarks and questions about a scenario outlined in the LessWrong post ‘For FAI: Is “Molecular Nanotechnology” putting our best foot forward?‘ on how an artificial general intelligence (short: AI) could take control of Earth by means of social engineering, rigging elections and killing enemies.

The scenario:

I’m fully convinced that a smarter than human AI could take control of the Earth via less magical means, using time tested methods such as manipulating humans, rigging elections, making friends, killing its enemies, and generally only being a marginally more clever and motivated than a typical human leader. A smarter than human AI could out-manipulate human institutions and out-plan human opponents with the sort of ruthless efficiency that modern computers beat humans in chess.


First of all, I agree with the following part of the post:

I have a hard time believing the only reason you can’t make a nanoassembler capable of arbitrary manipulations out of a handful of bottles you ordered from Sigma-Aldrich is because we’re just not smart enough.


I assume the reason than MNT is added to a discussion on AI is because we’re trying to make the future sound more plausible via adding burdensome details. I understand that AI and MNT is less probable than AI or MNT alone, but that both is supposed to sound more plausible.

As I have already outlined in a previous post, that scenario raises several questions.

Secondly, the following quote (emphasis mine) from the post linked to above is a great example of what I have been talking about in other posts:

If I had read the chain of reasoning smart computer->nanobots before I had built up a store of good-will from reading the Sequences, I would have almost immediately dismissed the whole FAI movement a bunch of soft science fiction, and it would have been very difficult to get me to take a second look.

But all of that is digressing from the objective of this post, namely asking some questions about the AI risk scenario that the author finds convincing.

When people tell me how an AI could somehow deceive and manipulate humans I like to ask them to first imagine a whole brain emulation of a grown up human and how it could become a pickup artist (a man who is skilled in the art of finding, attracting, and seducing women) instead of an AI trying to become good enough at social engineering to take over the world. I think it is sensible to assume that such an emulation might need an avatar or at least a lot of real world feedback before it could become sufficiently skilled to easily seduce women.

Once you notice that even for the emulation of a grown up adult it will be nontrivial to become a good pickup artist, the next step is to imagine how the emulation of a human toddler is going to acquire those skills before eventually trying to think about how an abstract AI that misses all of the hard coded capabilities of a human toddler is going to do it.


(1) How likely is an AI not to accidentally reveal its intention of taking over the world before it manages to become sufficiently good at assessing and manipulating humans to do so?

(2) How susceptible is a social engineering strategy to world domination to whistleblowers?

(3) How likely is it to manipulate the elections of a country in such a way that the instrumental usefulness to taking over the world does outweigh the possibility of being caught? Consider that there are pre-election polls and a lot of non-electronic oversight. Also consider that most democratic leaders have very little control (see e.g. ‘Lame duck‘).

(4) How would an AI make friends? By writing emails?

(5) How strong of a control is friendship as to be instrumentally useful in taking over the world? Consider how fragile human friendship is, being susceptible to small perturbations.

(6) By what means is the AI going to kill its enemies and how useful is that going to be? Consider how difficult it is even for a world power such as the USA to effectively kill its enemies.

Tags: ,

Related to: AI vs. humanity and the lack of concrete scenariosQuestions regarding the nanotechnology-AI-risk conjunction, AI risk scenario: Deceptive long-term replacement of the human workforceAI drives vs. practical research and the lack of specific decision procedures

Objective: Some remarks and questions about a scenario outlined by Mitchell Porter (source) on how an existential risk scenario involving advanced artificial general intelligence (short: AI) might be caused by a small but powerful network of organizations working for a great power in the interest of national security.

Mitchell Porter’s Elite Cabal:

…if we are interested in likely concrete scenarios, we should be considering something like: national-security elite of great power “X” have access to AI breakthroughs taken from the civilian world and then pushed over the edge by well-funded covert computer scientists. So the feedback loop of self-enhancement is not occurring solely within one single self-modifying program, but within a small but powerful network of organizations, whose value system is the “national interest” of one country…

Assumptions: I assume that we are still talking about an eventual technological existential risk scenario where some sort of artificially intelligent agency plays a role. Given that aforementioned assumption, the cabal behind this scenario must (1) be coherent enough to grasp the power of such a technology, in order for it to be funded (2) be smart enough to put all the pieces together (3) fail to notice that something that they believe to be very powerful could be very dangerous (4) succeed at creating such a powerful technology (5) fail in such a way that, in order to be powerful, the technology works perfectly well along a huge number of dimensions yet fails in such a way that it ends up deceiving and overpowering humanity.

Remarks: I like to analogize such a scenario to the creation of a generally intelligent autonomous car that works perfectly well at not destroying itself in a crash but which somehow manages to maximize the number of people to run over.

The failure mode, the mistake, would have to be selectively enough to only influence one or a few dimensions of how such an artificial general intelligence is supposed to work, causing it to fail in a highly complex, intelligent, rational yet catastrophically destructive way, while being indiscernible during the research and development process, i.e. before reaching the ability to influence the world in such a way.

For an artificial general intelligence to constitute a risk as a result of unintended consequences those unintended consequences would have to have no, or little, negative influence on the huge number of intended consequences that are necessary for it to be able to overpower humanity.

Now some people might object that the specific failure mode will be the emergence of certain instrumental goals, given a wide range of terminal goals, that are responsible for an artificial general intelligence to fail in a catastrophic way while an expert system would solve similar goals as intended or fail completely.


(1) How likely is the conjunction of having a group of people who is smart enough to create an AI that is capable of taking over the world but who however fails to predict the possible emergence of such instrumental drives given that such drives have already been predicted by people who were not capable of creating such an AI? Consider that such a group would also have to be highly rational because a powerful AI would itself have to be equipped with a good formalization of epistemic and instrumental rationality to be powerful in the first place.

(2) How would the AI initially manage to hide any suspicious signs of working against the intentions of its creators given that during its initial stages it will either still be a sub-human intelligence or lack certain skills?

(3) If such a cabal acts in the interest of national security, how likely are they to ignore possible risks associated with such a technology or fail to take preemptive security measures? Consider that great powers have a lot of practice from dealing with other dangers such as biological weapons.

(4) How likely is it that a group funded by the government in the interest of national security would not be highly suspicious of any data traffic or other actions that they are unable to explain?

For other related questions see the previous posts linked to above or see the points outlined here.

Tags: ,

Related to: AI vs. humanity and the lack of concrete scenariosQuestions regarding the nanotechnology-AI-risk conjunction

Objective: Some questions about a scenario related to the possibility of an advanced artificial general intelligence (short: AI) overpowering humanity. For the purpose of this post I will label the scenario a deceptive long-term replacement of the human workforce. As with all such scenarios it makes sense to take a closer look by posing certain questions about what needs to be true in order for a given scenario to work out in practice and to be better able to estimate its probability.

Deceptive long-term replacement of the human workforce:

If I could upload myself, limitlessly multiply, and run myself a hundred times faster than my human brain allows, I wouldn’t need new technological advancements like nanotechnology or even a qualitatively superior intelligence before I took over the world. I’d just make myself available (via bittorrent perhaps) and eventually irreplaceable to every business and endeavour in the world, by being a hundred times more productive and far cheaper than any individual non-menial worker (from bank accountants all the way to CEOs and diplomats) ever could. Once I’ve replaced all such professionals with myself, well, technically I’d be already in control of the world, but I could also change my visible goals without anyone having retained ability to give me significant resistance.

No new technology required.

The above description is due to Aris Katsaris (source).

Note: For the purpose of this post I will substitute artificial general intelligence for brain emulation. Assuming brain emulation instead of AI would require a different analysis as questions about the importance of prerequisite technologies such as nanotechnology would have to be posed and the difficulty of effective cooperation due to value drift would have to be examined differently.


(1) How likely is the instrumental goal of overpowering humanity to emerge in a general purpose AI (i.e. an AI design with a utility-function defined in such a way as to learn about and satisfy each customers intentions)? More on this here: AI drives vs. practical research and the lack of specific decision procedures

(2) How likely is the goal of overpowering humanity (which naturally includes the tendency to deceive humans about this intention) to go unnoticed during the research and development phase in which the AI is not yet fully developed and therefore more prone to failures (i.e. not very good at deceiving humans yet)?

(3) How likely is the AI to be modified for each purpose and customer and still retain a coherent plan of how to overpower humanity that allows it to effectively conspire with other copies without anyone noticing it?

(4) How likely are the elites, large companies and governments to trust a replacement of its workforce without demanding an inspection of the software and a thorough risk analysis, possibly constraining or even reversing such a replacement? As an example consider that China demanded Microsoft to disclose the source code of its operating system Windows (source).

(5) How likely are suspicious activities to go unnoticed with security experts, third party AI researchers, hackers or concerned customers?

(6) What happens if different customers employ their AI for purposes that are detrimental to the overall goal of overpowering humanity, such as proving the AI’s source code to be safe in a way that is verifiable by humans or inventing provably safe security protocols to protect crucial infrastructure from misuse or sabotage?

(7) How likely are people to be comfortable with replacing humans in power, such as politicians and other decision makers, with such a software?

(8) How likely is such a software to overpower humanity before other players manage to release their own general purpose AI’s as competitors, possibly constraining its influence or uncovering or thwarting its plan for world domination?

Tags: ,

Related to: AI vs. humanity and the lack of concrete scenarios

Objective: Posing questions examining what I call the nanotechnology-AI-risk conjunction, by which I am referring to a scenario that is often mentioned by people concerned about the idea of an artificial general intelligence (short: AI) attaining great power.

Below is a quote that is outlining the scenario in question (source: Intelligence Explosion Microeconomics, page 6.):

The first machine intelligence system to achieve sustainable returns on cognitive reinvestment is able to vastly improve its intelligence relatively quickly—for example, by rewriting its own software or by buying (or stealing) access to orders of magnitude more hardware on clustered servers. Such an AI is “prompt critical”— it can reinvest the fruits of its cognitive investments on short timescales, without the need to build new chip factories first. By the time such immediately accessible improvements run out, the AI is smart enough to, for example, crack the problem of protein structure prediction. The AI emails DNA sequences to online peptide synthesis labs (some of which boast a seventy-two-hour turnaround time), and uses the resulting custom proteins to construct more advanced ribosome-equivalents (molecular factories). Shortly afterward, the AI has its own molecular nanotechnology and can begin construction of much faster processors and other rapidly deployed, technologically advanced infrastructure. This rough sort of scenario is sometimes colloquially termed “hard takeoff ” or “AI-go-FOOM.”

A preliminary remark: If your AI relies on molecular nanotechnology to attain great power then the probability of any kind of AI attaining great power depends on factors such as the eventually attainable range of chemical reaction cycles, error rates, speed of operation, and thermodynamic efficiencies of such bottom-up manufacturing systems. To quote a report of the U.S. National Academy of Sciences in this regard (source):

… the eventually attainable perfection and complexity of manufactured products, while they can be calculated in theory, cannot be predicted with confidence. Finally, the optimum research paths that might lead to systems which greatly exceed the thermodynamic efficiencies and other capabilities of biological systems cannot be reliably predicted at this time. Research funding that is based on the ability of investigators to produce experimental demonstrations that link to abstract models and guide long-term vision is most appropriate to achieve this goal.

Assumptions: For the purpose of the following questions I will assume (1) that the kind of nanotechnology known from science fiction is in principle possible (2) that an advanced artificial general intelligence is required to invent such technology and not vice versa (in which case we should be worried about nanotechnology instead) (3) that any given AI would want to create molecular nanotechnology without this being an explicitly defined terminal goal (for more on this see: ‘AI drives vs. practical research and the lack of specific decision procedures‘).

Questions: A few initial questions that need to be answered in order to estimate the probability of the nanotechnology-AI-risk conjunction conditional on the above assumptions being true.

(1.0) How likely is an AI to be given control of the initially equipment necessary to construct molecular factories?

(1.1) How likely are an AI’s creators to let their AI do unsupervised research on molecular nanotechnology? Consider that possible risks associated with advanced nanotechnology are already widely known and taken seriously.

(1.2) How likely is an AI to use its initial infrastructure to succeed at doing covert research on molecular nanotechnology, without its creators noticing it?

(2.0) How likely is an AI to acquire useful long-term control of the equipment necessary to construct molecular factories without anyone noticing it?

(3.0) How likely is it that an AI manages to turn its discoveries into infrastructure and or tools that are instrumentally useful to deceive or overpower its creators before its creators or third-parties are still able to intervene and stop the AI?

All of the above questions can be broken up into a lot of more detailed questions while many additional questions are not asked. But I believe that those questions are a good starting point.

Tags: , ,

Objective: (1) Outlining how to examine the possibility of the emergence of dangerous goals in generally intelligent systems in the light of practical research and development. (2) Determining what decision procedures would cause generally intelligent systems to exhibit catastrophic side effects. 

There are arguments supporting the possibility that an advanced artificial general intelligence (short: AI) might exhibit specific universal drives which could interfere with human matters in catastrophic ways. It is for example argued that <self-protection> is important in order to achieve a wide range of goals. It is not my intention to discuss those arguments in particular but rather to look at various goals and how likely it is that different AI designs might follow decision procedures that cause them to exhibit catastrophic side effects given those goals.

To examine the possibility that a wide range of AI designs might exhibit catastrophic side effects, given a wide range of goals, several factors have to be considered. Factors such as (1) a cost benefit analyses of interfering with human matters (2) the necessity of a spatiotemporal planning horizon, given computationally limited agents, possibly limiting unbounded protectionism (3) how any given goal is interpreted given vagueness and uncertainty.

Simple and complex natural language goals such as <calculate 1+1> and <keep the trains running> should be examined to see if to expect more dangerous outcomes with more complex goals or vice versa.

Various questions should be asked to pinpoint the expected failure mode:

(1.0) How is a goal likely to be interpreted by an AI design: (1) arbitrarily (2) verbatim (3) as a problem in physics and mathematics that needs to be solved correctly?

(1.1) If an AI is interpreting a goal arbitrarily, how does it choose one interpretation over another?

(1.2) What does it mean for an AI to interpret a goal literally? Suppose the goal given is <build a hotel>. Is the terminal goal to create a hotel that is just a few nano meters in size? Is the terminal goal to create a hotel that reaches the orbit?

(1.3) If an AI design is going to interpret a goal as a problem in mathematics and physics, would it make sense to ignore various important facts about the universe such as what its creators intended it to do? Would it make sense to simply assume the most resource expensive interpretation and very likely end up doing more than necessary?

(2.0) What instrumental goals are implied by a terminal goal when interpreted by a specific AI design?

(2.1) Does a cost benefit analysis imply that it would be rational to take over the world?

(2.2) Would taking over the world, or some other far-reaching action, make sense if it is not even clear that it is instrumentally rational to allocate massive resources to do so? Does it for example make sense to build a bunker and kill all humans to make sure that you are unobstructed in calculating 1+1? Or would it make sense to turn everyone into paperclips if you are only supposed to create more paperclips than the best competitor without interfering with the world at large?

(4.0) If a specific AI design does exhibit catastrophic side effects given a goal that present day software tools can master with ease, such as the calculation of a driving route from Los Angeles to San Francisco by Google maps, is it possible to pinpoint what specifically causes that AI design to fail in such a way and how its creators did not foresee that failure mode? 

(4.1) If you were to alter a narrow AI expert system such as Google maps and incrementally turned it into a the kind of AI design that you expect to exhibit catastrophic side effects, given the same goal as the expert system, can you locate the tipping-point where on the way towards your AI design the well-behaved expert system starts to act in a catastrophic yet highly complex and intelligent way?

Example 01: 

Assume an ultra-advanced version of Google or IBM Watson.

If I was to ask such an answering machine how to prevent human suffering, would it be reasonable to assume that the top result it would return would be to kill all humans? Would any product that returns similarly wrong answers survive even the earliest research phase, let alone any market pressure?

Example 02: 

Assume an ultra-advanced version of Sirian intelligent personal assistant and knowledge navigator which works as an application for Apple’s iOS. 

If I tell the present day version of Siri, “Set up a meeting about the sales report at 9 a.m. Thursday.”, then the correct interpretation of that natural language request is to make a calendar appointment at 9 a.m. Thursday. A wrong interpretation would be to e.g. open a webpage about meetings happening Thursday or to shutdown the iPhone.

The question here becomes at which point of technological development there will be a transition from well-behaved systems like Siri, which are able to interpret a limited amount of natural language inputs correctly, to superhuman artificial generally intelligent systems that are in principle capable of understanding any human conversation but which in contrast to their narrow AI counterparts fail in catastrophic ways.

Further reading

Tags: ,

About this post: This post is supposed to be a preliminary outline of how to analyze concrete scenarios in which an advanced artificial general intelligence attempts to transform Earth in a catastrophic way.

Objective: Analyzing concrete scenarios helps to (1) better estimate the probability of possible catastrophic side effects associated with the invention of an advanced artificial general intelligence and helps to (2) design preemptive security measures.

Assumptions: For the purpose of this post I will assume an artificial general intelligence (short: AI) that is very roughly more intelligent than all of humanity and can process a greater amount of knowledge in a shorter period of time. I further assume that this agent does care about using all of the resources in the solar system for some goal unrelated to human values. Humans are considered a mere resource.

The question: Does such an AI constitute an existential risk? In other words, will such an AI cause human extinction?

This question can obviously be answered positive if such an AI is likely to achieve its goal. But how do we determine the probability of such a scenario? We have to carefully look at how such an AI could accomplish to defeat humanity (taking over the world).

We have to pay attention to a lot of factors if we want determine concrete scenarios of how an AI could overpower humanity and how probable each scenario is. Factors such as (1) the AI’s fragility to human counter strikes (2) dependency on the global infrastructure (3) ability to take control of external resources and to keep hold of those resources while remaining productive.

One of the most important question is how the advantage of greater intelligence scales with the task of taking over the world.

If we consider simple games such as Tic-tac-toe we can definitely say that superhuman intelligence would not be instrumentally useful at beating humans. You also won’t get a practical advantage by throwing more computational resources at the travelling salesman problem and other problems in the same class. The same might be said about improving a conversation in your favor by improving each sentence for thousands of years of subjective time. You will shortly hit diminishing returns. Especially if you lack the data to predict human opponents accurately.

Another example is due to Holden Karnofsky (source):

I find it somewhat helpful to analogize UFAI-human interactions to human-mosquito interactions. Humans are enormously more intelligent than mosquitoes; humans are good at predicting, manipulating, and destroying mosquitoes; humans do not value mosquitoes’ welfare; humans have other goals that mosquitoes interfere with; humans would like to see mosquitoes eradicated at least from certain parts of the planet. Yet humans haven’t accomplished such eradication…

Example scenario: Inventing new technologies to overpower humanity.

Consider that we are already at a point where we have to build billion dollar chip manufacturing facilities to run our mobile phones. We need to build huge particle accelerators to obtain new insights into the nature of reality. It takes a whole technological civilization to produce a modern smartphone.

In order to come up with new technologies an AI would somehow have to acquire large amounts of money. And even if it manages to do so, it is not easy to use the money. You can’t “just” build huge companies with fake identities, or use a straw man, to create revolutionary technologies easily. Running companies with real people takes a lot of real-world knowledge, interactions and feedback. But most importantly, it takes a lot of time. How likely is it for an AI to simply create a new Intel or Apple over a few years without its creators noticing anything?

Further questions:

  • What is the net advantage of eidetic memory if you consider that humans can use tools to effectively achieve the same?
  • What advantage is there between humans who can extent their working memory using their tools and an AI? We can make a certain kind of psychological distinction between things we can hold in our mind without tools, and things we can’t. Does this mean there is some radical qualitative advantage (as opposed to the obvious speed advantages) in increasing the capacity of working memory? If an AI that we invented can hold a complex model in its mind, then we can also simulate such a model by making use of expert systems. Does being consciously aware of the model make a great difference in principle to what you can do with the model? If your brain had a 1000 times larger working memory, would you be better at problem solving? Probably. Would you be 1000 times better?
  • What is the advantage of more serial power? Do important problems related to taking over the world fall into complexity classes where throwing more computational resources at a problem does not lead to diminishing returns? Increases in raw processing power don’t translate to proportional increases in actual utility. Your brand new PC does not improve your life twice than the PC you bought 18 months ago.
  • What is the advantage of parallel computation? It is not clear how many tasks are easily decomposable into smaller operations. Consider that the U.S. has many more and smarter people than the Taliban. The bottom line is that the U.S. devotes a lot more output per man-hour to defeat a completely inferior enemy. Yet their advantage does scale sublinearly
  • What evidence do we have that most evolutionary designs are vastly less efficient than their technological counterparts? A lot of the apparent advantages of intelligent design is a result of making questionable comparisons like between birds and rockets. We haven’t been able to design anything that is nearly as efficient as natural flight. It is true that artificial flight can overall carry more weight. But just because a train full of hard disk drives has more bandwidth than your internet connection does not imply that someone with trains full of HDD’s would be superior at data transfer.
  • What is the advantage of copying? The first artificial general intelligence might be a state of the art technology which might run on state of the art hardware, rather than one AI of a huge ecosystem of different AI’s that run everywhere from smartphones to personal computers. To imagine that an AI could simply copy itself would be similar to imaging that IBM’s Blue Brain Project could simply be copied in such a way that not only nobody notices the unexpected use of bandwidth and surge up of everyone’s CPU load but that it would run effectively enough to make it worthwhile to take the risk of detection and increased instability due to using highly volatile infrastructure that was never adapted to run such a software. Further consider that a collective of humans and their tools can also think much faster than a single human being. Yet how great is the advantage? Sometimes a single human being can outsmart humanity. Yet humanity can kill a single human being. What does this indicate about the relation between (1) greater intelligence (2) faster thinking and (3) greater power?

Tags: ,

« Older entries § Newer entries »