What I would like AI risk advocates to publish

“… pointing out that something scary is possible, is a very different thing from having an argument that it’s likely.”

— Ben Goertzel, The Scary Idea (and Why I Don’t Buy It)




I’m wary of using inferences derived from reasonable but unproven hypothesis as foundations for further speculative thinking and calls for action. Although AI risk advocates[1] do a good job on stating reasons to justify their mission and monetary support, they do neither substantiate their initial premises, to an extent that would allow an outsider to draw action-relevant conclusions, nor do they clarify their predictions in a concise and systematic way. Nevertheless predictions are being made, such as that there is a high likelihood of humanity’s demise given that we develop superhuman artificial general intelligence without first defining mathematically how to prove and guarantee its benevolence. But those predictions are not sufficiently supported, no decision procedure is provided on how to arrive at those conclusions and be sufficiently confided of their correctness. This I believe is unsatisfactory, it lacks transparency and does not allow a reassessment. This is not to say that they are wrong to make predictions, but that although those ideas can very well serve as an urge to caution they are not compelling without further substantiation.

AI risk advocates have to set themselves apart from works of science fiction and actually provide some formal analysis of what we know, what conclusions can be drawn and how they relate to predictions about risks associated with artificial general intelligence. There needs to be a risks benefits analysis that shows why AI risk mitigation is the best charitable cause and a way to reassess the results yourself.


AI risk advocates have created a highly complicated framework of speculations to support and reinforce each other.[2]

Although I can follow much of the reasoning and arguments, I’m currently unable to judge their overall credence. Are the conclusions justified? Are the arguments based on firm ground? Would their arguments withstand a critical inspection or examination by a third party, peer review? Are their estimations reflectively stable? How large is the model uncertainty? There is too much vagueness involved to tell.

Are AI risk advocates able to analyse the reasoning that led them to research friendly AI in the first place, or at least substantiate their estimations with other kinds of evidence than a coherent internal logic?

I’m concerned that, although consistently so, AI risk advocates and their supporters are largely updating on fictional evidence.

This post is meant to inquire about the foundations of their basic premises. Are they creating models to treat subsequent models or are their propositions based on fact?

Most of their arguments are based on a few conjectures and presuppositions about the behavior, drives and motivations of intelligent machines[3] and the use of probability and utility calculations to legitimate action.[4]

Explosive recursive self-improvement[5] is one of those presuppositions. The problem is that this and other presuppositions are largely ignored and left undefined. All of the disjunctive arguments put forth by AI risk advocates are trying to show that there are many causative factors that will result in the development of unfriendly[6] artificial general intelligence. Only one of those factors needs to be true for us to be wiped out by an artificial general intelligence. But the whole scenario is at most as probable as the assumption hidden in the words <artificial general intelligence> and <explosive recursive self-improvement>.

<Artificial General Intelligence> and <Explosive Recursive Self-improvement> might appear to be relatively simple and appealing concepts. But most of this superficial simplicity is a result of the vagueness of natural language descriptions. Reducing the vagueness[7] of those concepts by being more specific, or by coming up with technical definitions[8] of each of the words they are made up of, reveals the hidden complexity[9] that is comprised in the vagueness of the terms.

If we were going to define those concepts, and each of its terms, we would end up with a lot of additional concepts made up of other words or terms. Most of those additional concepts will demand explanations of their own, which will in turn result in even more speculation. If we are precise then any declarative sentence used in the final description will have to be true simultaneously. And this does reveal the true complexity of all hidden presuppositions and thereby influence the overall probability. That is because the conclusion of an argument that is made up of a lot of statements (terms) that can be false is more unlikely to be true, since complex arguments can fail in a lot of different ways. You need to support each part of the argument that can be true or false and you can therefore fail to support one or more of its parts, which in turn will render the overall conclusion false.

If the cornerstone of your argumentation, if one of your basic tenets is the likelihood of explosive recursive self-improvement, although a valid speculation, you are already in over your head with debt. Debt in the form of other kinds of evidence.

I am not to saying that it is a false hypothesis, that it is not even wrong, but that you cannot base a whole movement and a huge framework of further inference and supportive argumentation on such premises, on ideas that are themselves not based on firm ground.

The concept of an intelligence explosion, which is itself a logical implication, should not be used to make further inferences and estimations without additional evidence.

The gist of the matter is that a coherent and consistent framework of sound argumentation based on unsupported inference is nothing more than its description implies. It is fiction.

What I ask for

I would like to see AI risk advocates, or someone who is convinced of the scary idea[10][11][12][13], to publish a paper that states concisely and mathematically (and with possible extensive references if necessary) the decision procedure that led they to devote their life[14][15] to the development of friendly artificial intelligence.[16] I want them to state numeric probability estimates[17] and exemplify their chain of reasoning, how they came up with those numbers and not others by way of sober and evidence backed calculations.[18] I would like to see a precise and compelling review of the methodologies AI risk advocates use to arrive at their conclusions.

Concisely, the paper should account for the following issues and uncertainties:

  • The possibility that superhuman AI (artificial (general) intelligence) is too far away to be considered a risk at this time.
  • The possibility that the capability of AI will improve slowly enough for humans to adapt due various small-scale disasters.
  • The possibility that humans are able to create a provably safe environment to reliable contain any AI and thereby impede uncontrollable self-improvement.
  • The possibility that humans will merge with superhuman tools and become competitive to AI.
  • A comparison with other existential risks[19] and how risks from artificial intelligence[20] outweigh them.
  • Show that AI risk mitigation the best charitable cause and does not increase AI risks.[21]
  • Potential negative consequences of slowing down research on artificial intelligence (a risks and benefits analysis).[22][23]
  • The likelihood of a gradual and controllable development versus the likelihood of an intelligence explosion.[24]
  • The likelihood of unfriendly AI versus friendly AI as the outcome of practical AI research.[25]
  • The ability of superhuman intelligence and cognitive flexibility as characteristics alone to constitute a serious risk given the absence of enabling technologies like advanced nanotechnology.[26]
  • The feasibility of “provably non-dangerous AI”.
  • The disagreement of the overwhelming majority of scientists working on artificial intelligence.[27]
  • That some highly intelligent people who are aware of the position of AI risk advocates do not accept it.[28][29][30][31][32][33][34]
  • Possible conclusions that can be drawn from the Fermi paradox[35] regarding risks associated with superhuman AI versus other potential risks ahead.[36][37]

The paper should further answer the following questions and taboo “intelligence”[38] in doing so:

  • How is an AI going to become a master of dark arts[39] and social engineering[40] in order to persuade and deceive humans?
  • How is an AI going to coordinate a large scale conspiracy or deception, given its initial resources, without making any suspicious mistakes along the way?
  • How is an AI going to hack the Internet to acquire more computational resources?
  • Are those computational resources that can be hacked applicable to improve the general intelligence of an AI?
  • Does throwing more computational resources at important problems, like building new and better computational substrates, allow an AI to come up with better architectures so much faster as to outweigh the expenditure of obtaining those resources, without hitting diminishing returns?
  • Does an increase in intelligence vastly outweigh its computational cost and the expenditure of time needed to discover it?
  • How can small improvements replace conceptual revolutions that require the discovery of unknown unknowns?
  • How does an AI brute-force the discovery of unknown unknowns?
  • Is an agent of a given level of intelligence capable of handling its own complexity efficiently?
  • How is an AI going to predict how improvements, respectively improved versions of itself, are going to act, to ensure that its values are preserved?
  • How is an AI going to solve important problems without real-world experimentation and slow environmental feedback?
  • How is an AI going to build new computational substrates and obtain control of those resources without making use of existing infrastructure?
  • How is an AI going to cloak its actions, i.e. its energy consumption etc.?
  • How is an AI going to stop humans from using its own analytic and predictive algorithms in the form of expert systems to analyze and predict its malicious intentions?
  • How is an AI going to protect itself from human counter strikes given the fragility of the modern world and its infrastructure, e.g. without some sort of shellproof internal power supply?

In addition I would like the paper to include and lay out a formal and systematic summary of what AI risk advocates expect researchers who work on artificial general intelligence to do and why they should do so. I would like to see a clear logical argument for why people working on artificial general intelligence should listen to what AI risk advocates have to say.

“A first step is to ask people what it would take to get them to change their mind. If they refuse to give a straight answer, they can’t be taken seriously.” — John Baez

What would it take to increase my confidence that solving friendly AI is the most important problem humanity faces right now and that everyone should either actively work to solve it or contribute money to that particular cause?

To answer that question I will elaborate on some of the above points:

1.) Evidence that the invention of artificial general intelligence is likely to happen within 50-100 years from now.[41]

In other words, show that superhuman AI is not too far away to be considered a risk at this time.

For example:

  • The existence of a robot that could navigate autonomously in a real-world environment and survive real-world threats and attacks with approximately the skill of C. elegans.
  • A machine that can quickly learn to play Go[42] on its own, unassisted by humans, and beat the best human players.

2.) Evidence that the development of artificial general intelligence will take place quickly rather than gradually and slowly.

In other words, show that the capability of artificial general intelligence will improve quickly enough that humans won’t be able to adapt or learn from their mistakes due various small-scale disasters.

For example:

  • A theorem that there likely exists a information theoretically simple, physically and economically realizable, algorithm that can be improved to self-improve explosively.
  • Prove that there likely are no strongly diminishing intelligence returns for additional compute power.[43]

3.) Prove that other problems or existential risks like global warming or advanced molecular nanotechnology are not more likely to wipe us out before the advent of advanced artificial general intelligence.

For example:

  • Show that advanced molecular nanotechnology does not come first, either by being easier than artificial general intelligence or due to it being a prerequisites for an advanced artificial general intelligence to be invented.

4.) Provide an outline of how an artificial intelligence is going to overpower humanity without filling in any gaps by conjecturing some sort of highly speculative technological magic.

In other words, show how an artificial general intelligence is going to create (or acquire) resources, empowering technologies or civilisatory support.

5.) Provide an outline of how current research is supposed to lead from well-behaved and fine-tuned systems to systems that stop to work correctly in a highly complex and unbounded way.

In other words, show that dangerous recursive self-improvement is the default outcome of the creation of artificial general intelligence.

For example:

  • Show how something like expected utility maximization would actually work out in practice.
  • Conclusive evidence that current research will actually lead to the creation of superhuman AI designs equipped with the relevant drives that are necessary to disregard any explicit or implicit spatio-temporal scope boundaries and resource limits.

6.) Prove that trying to solve friendly AI is decreasing rather than increasing the probability of a negative utility outcome.

For example:

  • Prove that getting friendly AI almost but not quite right won’t be worse than an artificial general intelligence that was not explicitly designed to protect human values.

7.) Provide conclusive evidence that there is anything medium-probable that we can do to mitigate the risks associated with artificial general intelligence.

In other words, show that contributing money can make a difference at this time.

Further Reading

Notes and References

