What I would like AI risk advocates to publish

“… pointing out that something scary is possible, is a very different thing from having an argument that it’s likely.”

— Ben Goertzel, The Scary Idea (and Why I Don’t Buy It)




I’m wary of using inferences derived from reasonable but unproven hypothesis as foundations for further speculative thinking and calls for action. Although AI risk advocates[1] do a good job on stating reasons to justify their mission and monetary support, they do neither substantiate their initial premises, to an extent that would allow an outsider to draw action-relevant conclusions, nor do they clarify their predictions in a concise and systematic way. Nevertheless predictions are being made, such as that there is a high likelihood of humanity’s demise given that we develop superhuman artificial general intelligence without first defining mathematically how to prove and guarantee its benevolence. But those predictions are not sufficiently supported, no decision procedure is provided on how to arrive at those conclusions and be sufficiently confided of their correctness. This I believe is unsatisfactory, it lacks transparency and does not allow a reassessment. This is not to say that they are wrong to make predictions, but that although those ideas can very well serve as an urge to caution they are not compelling without further substantiation.

AI risk advocates have to set themselves apart from works of science fiction and actually provide some formal analysis of what we know, what conclusions can be drawn and how they relate to predictions about risks associated with artificial general intelligence. There needs to be a risks benefits analysis that shows why AI risk mitigation is the best charitable cause and a way to reassess the results yourself.


AI risk advocates have created a highly complicated framework of speculations to support and reinforce each other.[2]

Although I can follow much of the reasoning and arguments, I’m currently unable to judge their overall credence. Are the conclusions justified? Are the arguments based on firm ground? Would their arguments withstand a critical inspection or examination by a third party, peer review? Are their estimations reflectively stable? How large is the model uncertainty? There is too much vagueness involved to tell.

Are AI risk advocates able to analyse the reasoning that led them to research friendly AI in the first place, or at least substantiate their estimations with other kinds of evidence than a coherent internal logic?

I’m concerned that, although consistently so, AI risk advocates and their supporters are largely updating on fictional evidence.

This post is meant to inquire about the foundations of their basic premises. Are they creating models to treat subsequent models or are their propositions based on fact?

Most of their arguments are based on a few conjectures and presuppositions about the behavior, drives and motivations of intelligent machines[3] and the use of probability and utility calculations to legitimate action.[4]

Explosive recursive self-improvement[5] is one of those presuppositions. The problem is that this and other presuppositions are largely ignored and left undefined. All of the disjunctive arguments put forth by AI risk advocates are trying to show that there are many causative factors that will result in the development of unfriendly[6] artificial general intelligence. Only one of those factors needs to be true for us to be wiped out by an artificial general intelligence. But the whole scenario is at most as probable as the assumption hidden in the words <artificial general intelligence> and <explosive recursive self-improvement>.

<Artificial General Intelligence> and <Explosive Recursive Self-improvement> might appear to be relatively simple and appealing concepts. But most of this superficial simplicity is a result of the vagueness of natural language descriptions. Reducing the vagueness[7] of those concepts by being more specific, or by coming up with technical definitions[8] of each of the words they are made up of, reveals the hidden complexity[9] that is comprised in the vagueness of the terms.

If we were going to define those concepts, and each of its terms, we would end up with a lot of additional concepts made up of other words or terms. Most of those additional concepts will demand explanations of their own, which will in turn result in even more speculation. If we are precise then any declarative sentence used in the final description will have to be true simultaneously. And this does reveal the true complexity of all hidden presuppositions and thereby influence the overall probability. That is because the conclusion of an argument that is made up of a lot of statements (terms) that can be false is more unlikely to be true, since complex arguments can fail in a lot of different ways. You need to support each part of the argument that can be true or false and you can therefore fail to support one or more of its parts, which in turn will render the overall conclusion false.

If the cornerstone of your argumentation, if one of your basic tenets is the likelihood of explosive recursive self-improvement, although a valid speculation, you are already in over your head with debt. Debt in the form of other kinds of evidence.

I am not to saying that it is a false hypothesis, that it is not even wrong, but that you cannot base a whole movement and a huge framework of further inference and supportive argumentation on such premises, on ideas that are themselves not based on firm ground.

The concept of an intelligence explosion, which is itself a logical implication, should not be used to make further inferences and estimations without additional evidence.

The gist of the matter is that a coherent and consistent framework of sound argumentation based on unsupported inference is nothing more than its description implies. It is fiction.

What I ask for

I would like to see AI risk advocates, or someone who is convinced of the scary idea[10][11][12][13], to publish a paper that states concisely and mathematically (and with possible extensive references if necessary) the decision procedure that led they to devote their life[14][15] to the development of friendly artificial intelligence.[16] I want them to state numeric probability estimates[17] and exemplify their chain of reasoning, how they came up with those numbers and not others by way of sober and evidence backed calculations.[18] I would like to see a precise and compelling review of the methodologies AI risk advocates use to arrive at their conclusions.

Concisely, the paper should account for the following issues and uncertainties:

  • The possibility that superhuman AI (artificial (general) intelligence) is too far away to be considered a risk at this time.
  • The possibility that the capability of AI will improve slowly enough for humans to adapt due various small-scale disasters.
  • The possibility that humans are able to create a provably safe environment to reliable contain any AI and thereby impede uncontrollable self-improvement.
  • The possibility that humans will merge with superhuman tools and become competitive to AI.
  • A comparison with other existential risks[19] and how risks from artificial intelligence[20] outweigh them.
  • Show that AI risk mitigation the best charitable cause and does not increase AI risks.[21]
  • Potential negative consequences of slowing down research on artificial intelligence (a risks and benefits analysis).[22][23]
  • The likelihood of a gradual and controllable development versus the likelihood of an intelligence explosion.[24]
  • The likelihood of unfriendly AI versus friendly AI as the outcome of practical AI research.[25]
  • The ability of superhuman intelligence and cognitive flexibility as characteristics alone to constitute a serious risk given the absence of enabling technologies like advanced nanotechnology.[26]
  • The feasibility of “provably non-dangerous AI”.
  • The disagreement of the overwhelming majority of scientists working on artificial intelligence.[27]
  • That some highly intelligent people who are aware of the position of AI risk advocates do not accept it.[28][29][30][31][32][33][34]
  • Possible conclusions that can be drawn from the Fermi paradox[35] regarding risks associated with superhuman AI versus other potential risks ahead.[36][37]

The paper should further answer the following questions and taboo “intelligence”[38] in doing so:

  • How is an AI going to become a master of dark arts[39] and social engineering[40] in order to persuade and deceive humans?
  • How is an AI going to coordinate a large scale conspiracy or deception, given its initial resources, without making any suspicious mistakes along the way?
  • How is an AI going to hack the Internet to acquire more computational resources?
  • Are those computational resources that can be hacked applicable to improve the general intelligence of an AI?
  • Does throwing more computational resources at important problems, like building new and better computational substrates, allow an AI to come up with better architectures so much faster as to outweigh the expenditure of obtaining those resources, without hitting diminishing returns?
  • Does an increase in intelligence vastly outweigh its computational cost and the expenditure of time needed to discover it?
  • How can small improvements replace conceptual revolutions that require the discovery of unknown unknowns?
  • How does an AI brute-force the discovery of unknown unknowns?
  • Is an agent of a given level of intelligence capable of handling its own complexity efficiently?
  • How is an AI going to predict how improvements, respectively improved versions of itself, are going to act, to ensure that its values are preserved?
  • How is an AI going to solve important problems without real-world experimentation and slow environmental feedback?
  • How is an AI going to build new computational substrates and obtain control of those resources without making use of existing infrastructure?
  • How is an AI going to cloak its actions, i.e. its energy consumption etc.?
  • How is an AI going to stop humans from using its own analytic and predictive algorithms in the form of expert systems to analyze and predict its malicious intentions?
  • How is an AI going to protect itself from human counter strikes given the fragility of the modern world and its infrastructure, e.g. without some sort of shellproof internal power supply?

In addition I would like the paper to include and lay out a formal and systematic summary of what AI risk advocates expect researchers who work on artificial general intelligence to do and why they should do so. I would like to see a clear logical argument for why people working on artificial general intelligence should listen to what AI risk advocates have to say.

“A first step is to ask people what it would take to get them to change their mind. If they refuse to give a straight answer, they can’t be taken seriously.” — John Baez

What would it take to increase my confidence that solving friendly AI is the most important problem humanity faces right now and that everyone should either actively work to solve it or contribute money to that particular cause?

To answer that question I will elaborate on some of the above points:

1.) Evidence that the invention of artificial general intelligence is likely to happen within 50-100 years from now.[41]

In other words, show that superhuman AI is not too far away to be considered a risk at this time.

For example:

  • The existence of a robot that could navigate autonomously in a real-world environment and survive real-world threats and attacks with approximately the skill of C. elegans.
  • A machine that can quickly learn to play Go[42] on its own, unassisted by humans, and beat the best human players.

2.) Evidence that the development of artificial general intelligence will take place quickly rather than gradually and slowly.

In other words, show that the capability of artificial general intelligence will improve quickly enough that humans won’t be able to adapt or learn from their mistakes due various small-scale disasters.

For example:

  • A theorem that there likely exists a information theoretically simple, physically and economically realizable, algorithm that can be improved to self-improve explosively.
  • Prove that there likely are no strongly diminishing intelligence returns for additional compute power.[43]

3.) Prove that other problems or existential risks like global warming or advanced molecular nanotechnology are not more likely to wipe us out before the advent of advanced artificial general intelligence.

For example:

  • Show that advanced molecular nanotechnology does not come first, either by being easier than artificial general intelligence or due to it being a prerequisites for an advanced artificial general intelligence to be invented.

4.) Provide an outline of how an artificial intelligence is going to overpower humanity without filling in any gaps by conjecturing some sort of highly speculative technological magic.

In other words, show how an artificial general intelligence is going to create (or acquire) resources, empowering technologies or civilisatory support.

5.) Provide an outline of how current research is supposed to lead from well-behaved and fine-tuned systems to systems that stop to work correctly in a highly complex and unbounded way.

In other words, show that dangerous recursive self-improvement is the default outcome of the creation of artificial general intelligence.

For example:

  • Show how something like expected utility maximization would actually work out in practice.
  • Conclusive evidence that current research will actually lead to the creation of superhuman AI designs equipped with the relevant drives that are necessary to disregard any explicit or implicit spatio-temporal scope boundaries and resource limits.

6.) Prove that trying to solve friendly AI is decreasing rather than increasing the probability of a negative utility outcome.

For example:

  • Prove that getting friendly AI almost but not quite right won’t be worse than an artificial general intelligence that was not explicitly designed to protect human values.

7.) Provide conclusive evidence that there is anything medium-probable that we can do to mitigate the risks associated with artificial general intelligence.

In other words, show that contributing money can make a difference at this time.

Further Reading

Notes and References

[1] fhi.ox.ac.uk

[2] wiki.lesswrong.com/wiki/Sequences

[3] Bostrom on Superintelligence and Orthogonality

[4] A reply to John Baez

[5] lesswrong.com/lw/we/recursive_selfimprovement/

[6] wiki.lesswrong.com/wiki/Paperclip_maximizer

[7] The “no self-defeating object” argument, and the vagueness paradox

[8] The Advantages of Being Technical

[9] lesswrong.com/lw/jp/occams_razor/

[10] If anyone who is actively trying to build advanced artificial general intelligence succeeds, we’re highly likely to cause an involuntary end to the human race.

[11] SL4 comment

[12] lesswrong.com/lw/2zg/ben_goertzel_the_singularity_institutes_scary/

[13] lesswrong.com/lw/wp/what_i_think_if_not_why/

[14] Video Q&A with Eliezer Yudkowsky

[15] Eliezer Yudkowsky’s advice for Less Wrong readers who want to help save the human race.

[16] wiki.lesswrong.com/wiki/Friendly_artificial_intelligence

[17] “Stop taking the numbers so damn seriously, and think in terms of subjective probability distributions, discard your mental associates between numbers and absolutes, and my choice to say a number, rather than a vague word that could be interpreted as a probability anyway, makes sense. Working on www.theuncertainfuture.com, one of the things I appreciated the most were experts with the intelligence to make probability estimates, which can be recorded, checked, and updated with evidence, rather than vague statements like “pretty likely”, which have to be converted into probability estimates for Bayesian updating anyway. Futurists, stick your neck out! Use probability estimates rather than facile absolutes or vague phrases that mean so little that you are essentially hedging yourself into meaninglessness anyway.” — Michael Anissimov (existential.ieet.org mailing list, 2010-07-11)

[18] Asteroid Deflection as a Public Good

[19] Say you believe that unfriendly AI will wipe us out with a probability of 60% and that there is another existential risk that will wipe us out with a probability of 10% even if unfriendly AI turns out to be no risk or in all possible worlds where it comes later. Both risks have the same utility x (if we don’t assume that an unfriendly AI could also wipe out aliens etc.). Thus .6x > .1x. But if the probability of solving friendly AI = A to the probability of solving the second risk = B is A ≤ (1/6)B then the expected utility of mitigating friendly AI is at best equal to the other existential risk because .6Ax ≤ .1Bx.

Consider that one order of magnitude more utility could easily be outweighed or trumped by an underestimation of the complexity of friendly AI.

So how hard is it to solve friendly AI?

Take for example Pascal’s mugging, if you can’t solve it then you need to implement a hack that is largely based on human intuition. Therefore, in order to estimate the possibility of solving friendly AI one needs to account for the difficulty in solving all sub-problems.

Consider that we don’t even know “how one would start to research the problem of getting a hypothetical AGI to recognize humans as distinguished beings.”

[20] http://en.wikipedia.org/wiki/Superintelligence:_Paths,_Dangers,_Strategies

[21] By trying to solve friendly AI, AI risk advocates have to think about a lot of issues related to AI in general and might have to solve problems that will make it easier to create artificial general intelligence.

It is far from being clear that organisations working on the problem of AI risks are able to protect their findings against intrusion, betrayal, industrial or espionage.

There further are several possibilities by which AI risk advocates could actually cause a direct increase in negative utility.

1.) Friendly AI is incredible hard and complex. Complex systems can fail in complex ways. Agents that are an effect of evolution have complex values. To satisfy complex values you need to meet complex circumstances. Therefore any attempt at friendly AI, which is incredible complex, is likely to fail in unforeseeable ways. A half-baked, not quite friendly, AI might create a living hell for the rest of time, increasing negative utility dramatically.

2.) Humans are not provably friendly. Given the power to shape the universe certain organisations might fail to act altruistic and deliberately implement an AI with selfish motives or horrible strategies.

[22] Could being overcautious be itself an existential risk that might significantly outweigh the risk(s) posed by the subject of caution? Suppose that most civilizations err on the side of caution. This might cause them to either evolve much slower, so that the chance of a fatal natural disaster to occur before sufficient technology is developed to survive it rises to 100%, or stops them from evolving at all for being unable to prove something being sufficiently safe before trying it and thus never taking the necessary steps to become less vulnerable to naturally existing existential risks. 

[23] Why safety is not safe

[24] wiki.lesswrong.com/wiki/Intelligence_explosion

[25] Implicit Constraints of Practical Goals: intelligence probably implies benevolence (See also: The Fallacy of Dumb Superintelligence)

[26] en.wikipedia.org/wiki/Molecular_assembler

[27] wiki.lesswrong.com/wiki/Interview_series_on_risks_from_AI

[28] Thoughts on the Singularity Institute (SI)

[29] overcomingbias.com/2011/07/debating-yudkowsky.html

[30] overcomingbias.com/2011/06/the-betterness-explosion.html

[31] overcomingbias.com/2010/02/is-the-city-ularity-near.html

[32] John Baez, What To Do?

[33] Pascal’s scams

[34] How far can AI jump?

[35] ieet.org/index.php/IEET/more/treder20100302/

[36] SIA says AI is no big threat

[37] The Fermi paradox does allow for and provide the only conclusions and data we can analyze that amount to empirical criticism of concepts like that of a Paperclip maximizer and general risks from superhuman AI’s with non-human values without working directly on artificial general intelligence to test those hypothesis ourselves. If you accept the premise that life is not unique and special then one other technological civilization in the observable universe should be sufficient to leave potentially observable traces of technological tinkering. Due to the absence of any signs of intelligence out there, especially paper-clippers burning the cosmic commons, we might conclude that unfriendly AI could not be the most dangerous existential risk that we should worry about.

[38] If you are unable to answer those questions other than by invoking intelligence as some sort of magic that makes all problems disappear, the scenario that you envision is nothing more than pure fantasy!

You can’t estimate the probability and magnitude of the advantage an AI will have if you are using something that is as vague as the concept of “intelligence”.

Here is a case that bears some similarity and which might shed light on what I am trying to explain:

At his recent keynote speech at the New York Television Festival, former Star Trek writer and creator of the re-imagined Battlestar Galactica Ron Moore revealed the secret formula to writing for Trek.

He described how the writers would just insert “tech” into the scripts whenever they needed to resolve a story or plot line, then they’d have consultants fill in the appropriate words (aka technobabble) later.

“It became the solution to so many plot lines and so many stories,” Moore said. “It was so mechanical that we had science consultants who would just come up with the words for us and we’d just write ‘tech’ in the script. You know, Picard would say ‘Commander La Forge, tech the tech to the warp drive.’ I’m serious. If you look at those scripts, you’ll see that.”

Moore then went on to describe how a typical script might read before the science consultants did their thing:

La Forge: “Captain, the tech is overteching.”

Picard: “Well, route the auxiliary tech to the tech, Mr. La Forge.”

La Forge: “No, Captain. Captain, I’ve tried to tech the tech, and it won’t work.”

Picard: “Well, then we’re doomed.”

“And then Data pops up and says, ‘Captain, there is a theory that if you tech the other tech … ‘” Moore said. “It’s a rhythm and it’s a structure, and the words are meaningless. It’s not about anything except just sort of going through this dance of how they tech their way out of it.”

The use of “intelligence” is as misleading and dishonest in evaluating risks from AI as the use of “tech” in Star Trek.

[39] en.wikipedia.org/wiki/Psychological_manipulation

[40] en.wikipedia.org/wiki/Social_engineering_%28security%29

[41] How far is AGI?

[42] en.wikipedia.org/wiki/Go_(game)

[43] “…there could be non-linear complexity constrains meaning that even theoretically optimal algorithms experience strongly diminishing intelligence returns for additional compute power.” Q&A with Shane Legg on risks from AI

Tags: , , ,