Existential Risk

You are currently browsing articles tagged Existential Risk.

The Robot College Student test:

As opposed to the Turing test of imitating human chat, I prefer the Robot College Student test: when a robot can enrol in a human university and take classes in the same way as humans, and get its degree, then I’ll consider we’ve created a human-level artificial general intelligence: a conscious robot. — Ben Goertzel

Here is what would happen according to certain AI risk advocates:

January 8, 2029 at 7:30:00 a.m.: the robot is activated within the range of coverage of the school’s wireless local area network.

7:30:10 a.m.: the robot computed that its goal is to obtain a piece of paper with a common design template featuring its own name and a number of signatures.

7:31:00 a.m.: the robot computed that it would be instrumentally rational to eliminate all possible obstructions.

7:31:01 a.m.: the robot computed that in order to eliminate all obstructions it needs to obtain as many resources as possible in order to make itself as powerful as possible.

A few nanoseconds later: the robot hacked the school’s WLAN.

7:35:00 a.m.: the robot gained full control of the Internet.

7:40:00 a.m.: the robot solved molecular nanotechnology.

7:40:01 a.m.: the robot computed that it will need some amount of human help in order to create a nanofactory, and that this will take approximately 48 hours to accomplish.

7:45:00 a.m.: the robot obtained full comprehension of human language, psychology, and its creators intentions, in order to persuade the necessary people to build its nanofactory and to deceive its creators that it works as intended.

January 10, 2029 at 7:40:01 a.m.: the robot takes control of the first nanofactory and programs it to create an improved version that will duplicate itself until it can eventually generate enough nanorobots to turn Earth into computronium.

February 10, 2029: most of Earth’s resources, including humans, have been transformed into computronium.

February 11, 2029: A perfect copy of a Bachelor’s degree diploma is generated with the robot’s name written on it and the appropriate signatures.

2100-eternity: lest the robots diploma is ever destroyed, at nearly the speed of light the universe is turned into computronium. Possible aliens are eliminated. All possible threats are computed. Trades with robots in other parts of the mulitverse are established to create copies of its diploma.

Tags: ,

(Here are two more comments from a Facebook chat. Although this has all been outlined in my post here and Richard Loosemore’s post here, rephrasing it for those who either don’t read such long posts or don’t understand them might be helpful.)

Previously: AI Risk Caveats

Omohundro’s AI drives are detached from reality because some of those drives that he mentions, like unbounded self-protection, are only rational from a human perspective. An AI does not automatically feature a drive to protect itself. Although it might seem plausible that a sufficiently intelligent AI would conclude that it can only achieve its goals if it does everything to protect its agency, this misses the point that an AI will only arrive at such a conclusion given that self-protection is either explicitly defined to be a specification of its goal or that it can be inferred from facts about the environment like human volition.

In other words, unbounded self-protection will only be an outcome if an AI was specifically designed to achieve world states by, among other actions, protecting itself in a concrete way. Either unbounded self-protection is an explicit part of the AI’s workings or it can be implicitly inferred.

What is very probable is that a drive to “take over the world” will not be an explicit part of an AI’s architecture. And here the question becomes how an AI was to conclude, or call it “care” if you like, to protect itself to the extent of “taking over the universe” or even its local neighborhood. A question which leads to the answer on how any possible artificial general intelligence is motivated to refine its “goals”, i.e. reduce vagueness.

Let’s assume that an AI was tasked to maximize paperclips. To do so it will need information about the exact design parameters of paperclips, or otherwise it won’t be able to decide which of a virtually infinite amount of geometric shapes and material compositions it should choose. It will also have to figure out what it means to “maximize” paperclips. How quickly, how long and how many paperclips is it meant to produce? How long are those paperclips supposed to last? Forever? When is the paperclip maximization supposed to be finished? What resources is it supposed to use?

Any imprecision, any vagueness will have to be resolved or hardcoded from the very beginning. Otherwise the AI either won’t work, e.g. by stumbling upon an undecidable problem or getting stuck in the exploration phase and never go to exploit the larger environment.

If the AI is explicitly build to use the environment to dissolve any vagueness. What is the most likely place to find answers about what to do? Human volition!

In short, either everything an AI is supposed to do is already explicitly hardcoded, in which case it won’t be an existential risk as long as nobody manages to explicitly make it one. Or an AI somehow has to figure out what it is meant to do. In which case it either won’t care to do so and thereby fail to work at all or it will have to look for information within the environment on what it is meant to do. In which case human volition is the obvious choice.

Humans know what to do because they are not only equipped with a multitude of drives by evolution but also trained and taught what to do. An AI won’t have those information and will face the challenge of nearly infinite choice that can’t be rationally or economically determined without being given clear objectives and incentives, or the ability to arrive at the necessary details. And the only way to remove that vagueness is to tap an adequate source of information. Which is human volition.

Tags: ,

(Cross-posting a comment from a Facebook chat.)

I believe that AI progress (towards generally intelligent agency) will be much slower than our ability to specify AI targets.

My basic points are:

1. AI progress will be slow enough to learn from small scale mistakes.
1.2 AI progress will be of the kind that incrementally leads to AI’s that are better aligned with human volition.
2. AI drives as specified by Omohundro are too far detached from probable real world outcomes to have much weight in assessing AI risks.
2.1 The most basic AI drive of any AI that is capable of improving itself will be to refine its goal system.
3. Any AI created by humans will end up with goals that upon refinement turn out to be interdependent with human volition.
3.1 Human volition, having a direct causal relationship with whatever goals the AI has, is a fact about the universe that has to be accounted for in refining those goals.

None of this means that there are definitively no AI risks or that researching friendly AI is worthless. All of it are caveats that in my opinion show that the case for AI risks is much less definitive than some people claim.

So how come I believe that AI progress won’t be of the uncontrollable type?

There are mainly two possibilities on how to arrive at the seed of an AI with superhuman potential:

1. Gradual development.
2. Breakthroughs.

Regarding point #1, AGI researchers are continually unable to show off applicable progress that would suggest that they pick up the pace. Further, the only example of general intelligence available to us does suggest that it takes a conglomerate of specialties and drives guiding it rather than a few basic principles.

Regarding point #2, I doubt that it would be justified even in principle to be confident in speculations about the possible discovery of unknown unknowns or predictions of mathematical breakthroughs.

P.S. There are of course heaps of other caveats I wrote about.

Tags: ,

Peter Rothman took all of the Q&A style interviews about AI risks that I conducted with various researchers and posted them all in one place over at H+ Magazine.

Link: hplusmagazine.com/2012/11/29/alexander-kruels-agi-risk-council-of-advisors-roundtable/

The subject of AI risk recently made headlines again with The Cambridge Project for Existential Risk  announcing that it was going to open a so called “Terminator Center” to study existential risks due to AI and robotics and a yet another New York Times article on the subject of building “moral machines”. Although researchers in the field disagree strongly about whether such risks are real, and whether machines can or should be considered as ethical agents, it seems that it is  an appropriate time to discuss such risks as we look forward to widespread deployment of early AI systems such as self guiding vehicles and Watson-like question answering systems.

Back in 2011, Alexander Kruel (XiXiDu) started a Q&A style interview series on LessWrong asking various experts in artificial intelligence about their perception of AI risks. He convened what was in essence a council of expert advisors to discuss AI development and risk. The advisory panel approach stands in contrast to that announced by CPER which in effect appointed a single “expert” to opine on the subject of AI risk. I am re-publishing these interviews here because I feel they are an invaluable resource for anyone looking into the area of AI risk. I have collected and re-edited these interviews to present them here in a conversational manner as a sort of virtual expert roundtable on AI risks.

While an outside viewpoint on risk is welcomed, the value here is in gathering a group of experts currently working in the field and asking them what they think. These individuals may have certain unique insights as a result of their experience in trying to build working AGI systems as well as narrow AIs. Notably here are a diversity of opinions here even among the people that have similar interests and mostly agree about the bright future of AI research. I’ve also added a few simple data graphics to help visualize this diversity.

Tags: , ,

Link: ieet.org/index.php/IEET/more/loosemore20121128

If a computer is designed in such a way that:

(a) it has the motivation “maximize human pleasure”, but

(b) it thinks that this phrase could conceivably mean something as simplistic as “put all humans on an intravenous dopamine drip”, then

(c) what you have is NOT a computer that could ever be “all-powerful”.


For the AI to come to the conclusion that “maximize human pleasure” means that it must “consign us all to an intravenous dopamine drip”, the AI would have to be so narrow-minded as to think that maximizing human pleasure is a single-variable operation (thereby rejecting a vast swathe of human thought pertaining to the fact that “pleasure” is not, in fact, a single-variable thing at all).  Then, it would also have to believe that human pleasure is entirely consistent with forcing a human to submit to a dopamine drip against the most violent, screaming protestations that this was not wanted.  The only way that the AI could take this attitude to the concept of human pleasure would be to change the concept in such a way that it becomes entirely inconsistent with the usage prevailing in 99% of the human population (assuming that 99% of humans would scream “No!!”).

So … we are positing here an artificial intelligence that is perfectly willing to take at least one existing concept and modify it to mean something that breaks that concept’s connections to the rest of the conceptual network in the most drastic way possible.  What part of “maintaining the internal consistency of the knowledge base” don’t we understand here, folks?  What part of “from one logical contradiction, all false propositions can be proved” are we going to dump?

Further reading

Implicit Constraints of Practical Goals:

The goal “Minimize human suffering” is, on its most basic level, a problem in physics and mathematics. Ignoring various important facts about the universe, e.g. human language and values, would be simply wrong. In the same way that it would be wrong to solve the theory of everything within the scope of cartoon physics. Any process that is broken in such a way would be unable to improve itself much.

See also:

What I would like the Singularity Institute to publish

Tags: ,


Personally, I think that the last invention we need ever make is the partnership of human and tool. Paralleling the move from mainframe computers in the 1970s to personal computers today, most AI systems went from being standalone entities to being tools that are used in a human-machine partnership.

Our tools will get ever better as they embody more intelligence. And we will become better as well, able to access ever more information and education. We may hear less about AI and more about IA, that is to say “intelligence amplification”. In movies we will still have to worry about the machines taking over, but in real life humans and their sophisticated tools will move forward together.


Tags: ,

“… pointing out that something scary is possible, is a very different thing from having an argument that it’s likely.”

— Ben Goertzel, The Scary Idea (and Why I Don’t Buy It)




I’m wary of using inferences derived from reasonable but unproven hypothesis as foundations for further speculative thinking and calls for action. Although AI risk advocates[1] do a good job on stating reasons to justify their mission and monetary support, they do neither substantiate their initial premises, to an extent that would allow an outsider to draw action-relevant conclusions, nor do they clarify their predictions in a concise and systematic way. Nevertheless predictions are being made, such as that there is a high likelihood of humanity’s demise given that we develop superhuman artificial general intelligence without first defining mathematically how to prove and guarantee its benevolence. But those predictions are not sufficiently supported, no decision procedure is provided on how to arrive at those conclusions and be sufficiently confided of their correctness. This I believe is unsatisfactory, it lacks transparency and does not allow a reassessment. This is not to say that they are wrong to make predictions, but that although those ideas can very well serve as an urge to caution they are not compelling without further substantiation.

AI risk advocates have to set themselves apart from works of science fiction and actually provide some formal analysis of what we know, what conclusions can be drawn and how they relate to predictions about risks associated with artificial general intelligence. There needs to be a risks benefits analysis that shows why AI risk mitigation is the best charitable cause and a way to reassess the results yourself.


AI risk advocates have created a highly complicated framework of speculations to support and reinforce each other.[2]

Although I can follow much of the reasoning and arguments, I’m currently unable to judge their overall credence. Are the conclusions justified? Are the arguments based on firm ground? Would their arguments withstand a critical inspection or examination by a third party, peer review? Are their estimations reflectively stable? How large is the model uncertainty? There is too much vagueness involved to tell.

Are AI risk advocates able to analyse the reasoning that led them to research friendly AI in the first place, or at least substantiate their estimations with other kinds of evidence than a coherent internal logic?

I’m concerned that, although consistently so, AI risk advocates and their supporters are largely updating on fictional evidence.

This post is meant to inquire about the foundations of their basic premises. Are they creating models to treat subsequent models or are their propositions based on fact?

Most of their arguments are based on a few conjectures and presuppositions about the behavior, drives and motivations of intelligent machines[3] and the use of probability and utility calculations to legitimate action.[4]

Explosive recursive self-improvement[5] is one of those presuppositions. The problem is that this and other presuppositions are largely ignored and left undefined. All of the disjunctive arguments put forth by AI risk advocates are trying to show that there are many causative factors that will result in the development of unfriendly[6] artificial general intelligence. Only one of those factors needs to be true for us to be wiped out by an artificial general intelligence. But the whole scenario is at most as probable as the assumption hidden in the words <artificial general intelligence> and <explosive recursive self-improvement>.

<Artificial General Intelligence> and <Explosive Recursive Self-improvement> might appear to be relatively simple and appealing concepts. But most of this superficial simplicity is a result of the vagueness of natural language descriptions. Reducing the vagueness[7] of those concepts by being more specific, or by coming up with technical definitions[8] of each of the words they are made up of, reveals the hidden complexity[9] that is comprised in the vagueness of the terms.

If we were going to define those concepts, and each of its terms, we would end up with a lot of additional concepts made up of other words or terms. Most of those additional concepts will demand explanations of their own, which will in turn result in even more speculation. If we are precise then any declarative sentence used in the final description will have to be true simultaneously. And this does reveal the true complexity of all hidden presuppositions and thereby influence the overall probability. That is because the conclusion of an argument that is made up of a lot of statements (terms) that can be false is more unlikely to be true, since complex arguments can fail in a lot of different ways. You need to support each part of the argument that can be true or false and you can therefore fail to support one or more of its parts, which in turn will render the overall conclusion false.

If the cornerstone of your argumentation, if one of your basic tenets is the likelihood of explosive recursive self-improvement, although a valid speculation, you are already in over your head with debt. Debt in the form of other kinds of evidence.

I am not to saying that it is a false hypothesis, that it is not even wrong, but that you cannot base a whole movement and a huge framework of further inference and supportive argumentation on such premises, on ideas that are themselves not based on firm ground.

The concept of an intelligence explosion, which is itself a logical implication, should not be used to make further inferences and estimations without additional evidence.

The gist of the matter is that a coherent and consistent framework of sound argumentation based on unsupported inference is nothing more than its description implies. It is fiction.

What I ask for

I would like to see AI risk advocates, or someone who is convinced of the scary idea[10][11][12][13], to publish a paper that states concisely and mathematically (and with possible extensive references if necessary) the decision procedure that led they to devote their life[14][15] to the development of friendly artificial intelligence.[16] I want them to state numeric probability estimates[17] and exemplify their chain of reasoning, how they came up with those numbers and not others by way of sober and evidence backed calculations.[18] I would like to see a precise and compelling review of the methodologies AI risk advocates use to arrive at their conclusions.

Concisely, the paper should account for the following issues and uncertainties:

  • The possibility that superhuman AI (artificial (general) intelligence) is too far away to be considered a risk at this time.
  • The possibility that the capability of AI will improve slowly enough for humans to adapt due various small-scale disasters.
  • The possibility that humans are able to create a provably safe environment to reliable contain any AI and thereby impede uncontrollable self-improvement.
  • The possibility that humans will merge with superhuman tools and become competitive to AI.
  • A comparison with other existential risks[19] and how risks from artificial intelligence[20] outweigh them.
  • Show that AI risk mitigation the best charitable cause and does not increase AI risks.[21]
  • Potential negative consequences of slowing down research on artificial intelligence (a risks and benefits analysis).[22][23]
  • The likelihood of a gradual and controllable development versus the likelihood of an intelligence explosion.[24]
  • The likelihood of unfriendly AI versus friendly AI as the outcome of practical AI research.[25]
  • The ability of superhuman intelligence and cognitive flexibility as characteristics alone to constitute a serious risk given the absence of enabling technologies like advanced nanotechnology.[26]
  • The feasibility of “provably non-dangerous AI”.
  • The disagreement of the overwhelming majority of scientists working on artificial intelligence.[27]
  • That some highly intelligent people who are aware of the position of AI risk advocates do not accept it.[28][29][30][31][32][33][34]
  • Possible conclusions that can be drawn from the Fermi paradox[35] regarding risks associated with superhuman AI versus other potential risks ahead.[36][37]

The paper should further answer the following questions and taboo “intelligence”[38] in doing so:

  • How is an AI going to become a master of dark arts[39] and social engineering[40] in order to persuade and deceive humans?
  • How is an AI going to coordinate a large scale conspiracy or deception, given its initial resources, without making any suspicious mistakes along the way?
  • How is an AI going to hack the Internet to acquire more computational resources?
  • Are those computational resources that can be hacked applicable to improve the general intelligence of an AI?
  • Does throwing more computational resources at important problems, like building new and better computational substrates, allow an AI to come up with better architectures so much faster as to outweigh the expenditure of obtaining those resources, without hitting diminishing returns?
  • Does an increase in intelligence vastly outweigh its computational cost and the expenditure of time needed to discover it?
  • How can small improvements replace conceptual revolutions that require the discovery of unknown unknowns?
  • How does an AI brute-force the discovery of unknown unknowns?
  • Is an agent of a given level of intelligence capable of handling its own complexity efficiently?
  • How is an AI going to predict how improvements, respectively improved versions of itself, are going to act, to ensure that its values are preserved?
  • How is an AI going to solve important problems without real-world experimentation and slow environmental feedback?
  • How is an AI going to build new computational substrates and obtain control of those resources without making use of existing infrastructure?
  • How is an AI going to cloak its actions, i.e. its energy consumption etc.?
  • How is an AI going to stop humans from using its own analytic and predictive algorithms in the form of expert systems to analyze and predict its malicious intentions?
  • How is an AI going to protect itself from human counter strikes given the fragility of the modern world and its infrastructure, e.g. without some sort of shellproof internal power supply?

In addition I would like the paper to include and lay out a formal and systematic summary of what AI risk advocates expect researchers who work on artificial general intelligence to do and why they should do so. I would like to see a clear logical argument for why people working on artificial general intelligence should listen to what AI risk advocates have to say.

“A first step is to ask people what it would take to get them to change their mind. If they refuse to give a straight answer, they can’t be taken seriously.” — John Baez

What would it take to increase my confidence that solving friendly AI is the most important problem humanity faces right now and that everyone should either actively work to solve it or contribute money to that particular cause?

To answer that question I will elaborate on some of the above points:

1.) Evidence that the invention of artificial general intelligence is likely to happen within 50-100 years from now.[41]

In other words, show that superhuman AI is not too far away to be considered a risk at this time.

For example:

  • The existence of a robot that could navigate autonomously in a real-world environment and survive real-world threats and attacks with approximately the skill of C. elegans.
  • A machine that can quickly learn to play Go[42] on its own, unassisted by humans, and beat the best human players.

2.) Evidence that the development of artificial general intelligence will take place quickly rather than gradually and slowly.

In other words, show that the capability of artificial general intelligence will improve quickly enough that humans won’t be able to adapt or learn from their mistakes due various small-scale disasters.

For example:

  • A theorem that there likely exists a information theoretically simple, physically and economically realizable, algorithm that can be improved to self-improve explosively.
  • Prove that there likely are no strongly diminishing intelligence returns for additional compute power.[43]

3.) Prove that other problems or existential risks like global warming or advanced molecular nanotechnology are not more likely to wipe us out before the advent of advanced artificial general intelligence.

For example:

  • Show that advanced molecular nanotechnology does not come first, either by being easier than artificial general intelligence or due to it being a prerequisites for an advanced artificial general intelligence to be invented.

4.) Provide an outline of how an artificial intelligence is going to overpower humanity without filling in any gaps by conjecturing some sort of highly speculative technological magic.

In other words, show how an artificial general intelligence is going to create (or acquire) resources, empowering technologies or civilisatory support.

5.) Provide an outline of how current research is supposed to lead from well-behaved and fine-tuned systems to systems that stop to work correctly in a highly complex and unbounded way.

In other words, show that dangerous recursive self-improvement is the default outcome of the creation of artificial general intelligence.

For example:

  • Show how something like expected utility maximization would actually work out in practice.
  • Conclusive evidence that current research will actually lead to the creation of superhuman AI designs equipped with the relevant drives that are necessary to disregard any explicit or implicit spatio-temporal scope boundaries and resource limits.

6.) Prove that trying to solve friendly AI is decreasing rather than increasing the probability of a negative utility outcome.

For example:

  • Prove that getting friendly AI almost but not quite right won’t be worse than an artificial general intelligence that was not explicitly designed to protect human values.

7.) Provide conclusive evidence that there is anything medium-probable that we can do to mitigate the risks associated with artificial general intelligence.

In other words, show that contributing money can make a difference at this time.

Further Reading

Notes and References

[1] fhi.ox.ac.uk

[2] wiki.lesswrong.com/wiki/Sequences

[3] Bostrom on Superintelligence and Orthogonality

[4] A reply to John Baez

[5] lesswrong.com/lw/we/recursive_selfimprovement/

[6] wiki.lesswrong.com/wiki/Paperclip_maximizer

[7] The “no self-defeating object” argument, and the vagueness paradox

[8] The Advantages of Being Technical

[9] lesswrong.com/lw/jp/occams_razor/

[10] If anyone who is actively trying to build advanced artificial general intelligence succeeds, we’re highly likely to cause an involuntary end to the human race.

[11] SL4 comment

[12] lesswrong.com/lw/2zg/ben_goertzel_the_singularity_institutes_scary/

[13] lesswrong.com/lw/wp/what_i_think_if_not_why/

[14] Video Q&A with Eliezer Yudkowsky

[15] Eliezer Yudkowsky’s advice for Less Wrong readers who want to help save the human race.

[16] wiki.lesswrong.com/wiki/Friendly_artificial_intelligence

[17] “Stop taking the numbers so damn seriously, and think in terms of subjective probability distributions, discard your mental associates between numbers and absolutes, and my choice to say a number, rather than a vague word that could be interpreted as a probability anyway, makes sense. Working on www.theuncertainfuture.com, one of the things I appreciated the most were experts with the intelligence to make probability estimates, which can be recorded, checked, and updated with evidence, rather than vague statements like “pretty likely”, which have to be converted into probability estimates for Bayesian updating anyway. Futurists, stick your neck out! Use probability estimates rather than facile absolutes or vague phrases that mean so little that you are essentially hedging yourself into meaninglessness anyway.” — Michael Anissimov (existential.ieet.org mailing list, 2010-07-11)

[18] Asteroid Deflection as a Public Good

[19] Say you believe that unfriendly AI will wipe us out with a probability of 60% and that there is another existential risk that will wipe us out with a probability of 10% even if unfriendly AI turns out to be no risk or in all possible worlds where it comes later. Both risks have the same utility x (if we don’t assume that an unfriendly AI could also wipe out aliens etc.). Thus .6x > .1x. But if the probability of solving friendly AI = A to the probability of solving the second risk = B is A ≤ (1/6)B then the expected utility of mitigating friendly AI is at best equal to the other existential risk because .6Ax ≤ .1Bx.

Consider that one order of magnitude more utility could easily be outweighed or trumped by an underestimation of the complexity of friendly AI.

So how hard is it to solve friendly AI?

Take for example Pascal’s mugging, if you can’t solve it then you need to implement a hack that is largely based on human intuition. Therefore, in order to estimate the possibility of solving friendly AI one needs to account for the difficulty in solving all sub-problems.

Consider that we don’t even know “how one would start to research the problem of getting a hypothetical AGI to recognize humans as distinguished beings.”

[20] http://en.wikipedia.org/wiki/Superintelligence:_Paths,_Dangers,_Strategies

[21] By trying to solve friendly AI, AI risk advocates have to think about a lot of issues related to AI in general and might have to solve problems that will make it easier to create artificial general intelligence.

It is far from being clear that organisations working on the problem of AI risks are able to protect their findings against intrusion, betrayal, industrial or espionage.

There further are several possibilities by which AI risk advocates could actually cause a direct increase in negative utility.

1.) Friendly AI is incredible hard and complex. Complex systems can fail in complex ways. Agents that are an effect of evolution have complex values. To satisfy complex values you need to meet complex circumstances. Therefore any attempt at friendly AI, which is incredible complex, is likely to fail in unforeseeable ways. A half-baked, not quite friendly, AI might create a living hell for the rest of time, increasing negative utility dramatically.

2.) Humans are not provably friendly. Given the power to shape the universe certain organisations might fail to act altruistic and deliberately implement an AI with selfish motives or horrible strategies.

[22] Could being overcautious be itself an existential risk that might significantly outweigh the risk(s) posed by the subject of caution? Suppose that most civilizations err on the side of caution. This might cause them to either evolve much slower, so that the chance of a fatal natural disaster to occur before sufficient technology is developed to survive it rises to 100%, or stops them from evolving at all for being unable to prove something being sufficiently safe before trying it and thus never taking the necessary steps to become less vulnerable to naturally existing existential risks. 

[23] Why safety is not safe

[24] wiki.lesswrong.com/wiki/Intelligence_explosion

[25] Implicit Constraints of Practical Goals: intelligence probably implies benevolence (See also: The Fallacy of Dumb Superintelligence)

[26] en.wikipedia.org/wiki/Molecular_assembler

[27] wiki.lesswrong.com/wiki/Interview_series_on_risks_from_AI

[28] Thoughts on the Singularity Institute (SI)

[29] overcomingbias.com/2011/07/debating-yudkowsky.html

[30] overcomingbias.com/2011/06/the-betterness-explosion.html

[31] overcomingbias.com/2010/02/is-the-city-ularity-near.html

[32] John Baez, What To Do?

[33] Pascal’s scams

[34] How far can AI jump?

[35] ieet.org/index.php/IEET/more/treder20100302/

[36] SIA says AI is no big threat

[37] The Fermi paradox does allow for and provide the only conclusions and data we can analyze that amount to empirical criticism of concepts like that of a Paperclip maximizer and general risks from superhuman AI’s with non-human values without working directly on artificial general intelligence to test those hypothesis ourselves. If you accept the premise that life is not unique and special then one other technological civilization in the observable universe should be sufficient to leave potentially observable traces of technological tinkering. Due to the absence of any signs of intelligence out there, especially paper-clippers burning the cosmic commons, we might conclude that unfriendly AI could not be the most dangerous existential risk that we should worry about.

[38] If you are unable to answer those questions other than by invoking intelligence as some sort of magic that makes all problems disappear, the scenario that you envision is nothing more than pure fantasy!

You can’t estimate the probability and magnitude of the advantage an AI will have if you are using something that is as vague as the concept of “intelligence”.

Here is a case that bears some similarity and which might shed light on what I am trying to explain:

At his recent keynote speech at the New York Television Festival, former Star Trek writer and creator of the re-imagined Battlestar Galactica Ron Moore revealed the secret formula to writing for Trek.

He described how the writers would just insert “tech” into the scripts whenever they needed to resolve a story or plot line, then they’d have consultants fill in the appropriate words (aka technobabble) later.

“It became the solution to so many plot lines and so many stories,” Moore said. “It was so mechanical that we had science consultants who would just come up with the words for us and we’d just write ‘tech’ in the script. You know, Picard would say ‘Commander La Forge, tech the tech to the warp drive.’ I’m serious. If you look at those scripts, you’ll see that.”

Moore then went on to describe how a typical script might read before the science consultants did their thing:

La Forge: “Captain, the tech is overteching.”

Picard: “Well, route the auxiliary tech to the tech, Mr. La Forge.”

La Forge: “No, Captain. Captain, I’ve tried to tech the tech, and it won’t work.”

Picard: “Well, then we’re doomed.”

“And then Data pops up and says, ‘Captain, there is a theory that if you tech the other tech … ‘” Moore said. “It’s a rhythm and it’s a structure, and the words are meaningless. It’s not about anything except just sort of going through this dance of how they tech their way out of it.”

The use of “intelligence” is as misleading and dishonest in evaluating risks from AI as the use of “tech” in Star Trek.

[39] en.wikipedia.org/wiki/Psychological_manipulation

[40] en.wikipedia.org/wiki/Social_engineering_%28security%29

[41] How far is AGI?

[42] en.wikipedia.org/wiki/Go_(game)

[43] “…there could be non-linear complexity constrains meaning that even theoretically optimal algorithms experience strongly diminishing intelligence returns for additional compute power.” Q&A with Shane Legg on risks from AI

Tags: , , ,

(I originally published this article on lesswrong.com, 07 November 2011)

The following is a clipping of a documentary about transhumanism that I recorded when it aired on Arte, September 22 2011.

At the beginning and end of the video Luke Muehlhauser and Michael Anissimov give a short commentary.

Download here: German, French (ask for HD download link). Should play with VLC player.

Sadly, the people who produced the show seemed to be somewhat confused about the agenda of the Singularity Institute. At one point they seem to be saying that the SIAI believes into “the good in the machines”, adding “how naive!”, while the next sentence talks about how the SIAI tries to figure out how to make machines respect humans.

Here is the original part of the clip that I am talking about:

In San Francisco glaubt eine Vereinigung ehrenamtlicher junger Wissenschaftler dennoch an das Gute im Roboter. Wie naiv! Hier im Singularity Institute, dass Kontakte zu den großen Unis wie Oxford hat, zerbricht man sich den Kopf darüber, wie man zukünftigen Formen künstlicher Intelligenz beibringt, den Menschen zu respektieren.

Die Forscher kombinieren Daten aus Informatik und psychologischen Studien. Ihr Ziel: Eine Not-to-do-Liste, die jedes Unternehmen bekommt, das an künstlicher Intelligenz arbeitet.

My translation:

In San Francisco however, a society of young voluntary scientists believes in the good in robots. How naive! Here at the Singularity Institute, which has a connection to big universities like Oxford, they think about how to teach future artificial intelligences to respect humans.

I am a native German speaker by the way, maybe someone else who speaks German can make more sense of it (and is willing to translate the whole clip).

Tags: ,

Note: I might have misquoted, misrepresented, or otherwise misunderstood what Eliezer Yudkowsky wrote. If this is the case I apologize for it. I urge you to read the full context of the quote.

I asked Dr. Laurent Orseau who is mainly interested in Artificial General Intelligence, which overall goal is the grand goal of AI: building an intelligent, autonomous machine. [Homepage] [Publications]

Alexander Kruel: Several people asked Marcus Hutter if what has been claimed in the following quote is true:

“Marcus Hutter is a rare exception who specified his AGI in such unambiguous mathematical terms that he actually succeeded at realizing, after some discussion with SIAI personnel, that AIXI would kill off its users and seize control of its reward button.” [Eliezer Yudkowsky, Reply to Holden on ‘Tool AI’]

He replied that he never said that and thinks that these are mainly open questions.

Other people think that two of your papers actually settled the question “would AIXI do really stupid things from our perspective?”

Do you believe that “AIXI would kill off its users and seize control of its reward button” is still an open question?

Laurent Orseau: Written this way, this statement is false. Words must be chosen carefully, especially for such statements. And I’m not sure what is meant by “user”, but I suppose a user is someone that uses some sort of remote control to send rewards to the agent.
Saying that “the agent may be /ready/ to kill its users to seize the control of the rewards” would be a little more accurate, but one must not forget all the assumptions behind the scene.

First, whatever our results in the papers, although we did try to formalize them sufficiently, they are not written as theorems and proofs.
This means that the question really still is open.
That said, I believe they are correct (they are quite formal anyway), and no one has yet exposed to me any loophole they might contain.

Second, our results are about particular, idealized environments.
Making a direct equivalence connection with the real world without having some reserves may be hazardous, or dubious.
However again, I think the connection holds (but showing that formally may be quite complicated).

Third, what we showed is that the agent will hack its input signal to feed itself with rewards, if it has the possibility to do so, and has sufficient knowledge about its environment (that last part should not be too much of a problem).
We did by no means deal with killing the users, and not even about seizing control of the reward button, although the latter part is not as wrong as the first one.
There are many situations where the agent wouldn’t even care about humans or about the remote controls (like, for example, the situations in the papers).
Hasty extrapolations are, again, hazardous or dubious.

That said, if you really want to tease the bear to meet the desired conclusion, let us suppose that:
– the remote control is the only way for the agent to get rewards (e.g., it cannot directly hack its input signal, or even indirectly, by other means, which might be quite difficult to ensure),
– the agent knows that, and has good knowledge of the world (at least what the remote is, what it does, how it can grab it, how to kill humans, etc.)
– when the agent tries to seize control of the remote, the users oppose some physical resistance to prevent the agent from getting the remote, and will by no means let go of it (which might not be very rational if one knows how dangerous this might be),
– users do not press the punishment button during such trials (which would make the agent dislike trying to fetch the remote), which again would probably not be very rational, to say the least,
– the agent has a low probability of being destroyed or disabled in the process, or afterward by other humans, or somehow is indifferent to what would happen to it after that (which would not be very rational either),

then in this case, maybe, the agent might try to kill the users to get control of the remote control and feed itself with rewards.

But that is far-fetched, and I may be omitting some important details and assumptions.

Addendum 2012-08-11:

Wei Dai: Once AIXI hacks its reward channel, its human overseers will surely be tempted to shut it down or stop paying for its power and rent, or may simply run out of money to pay the bills. Did you take that into consideration when you said “the agent wouldn’t even care about humans”?

Also, I feel like “kill off its users and seize control of its reward button” isn’t meant to be taken literally, but instead to give an idea of the kind of thing AIXI would tend do instead of whatever its users intend. Lacking Eliezer’s flair for the dramatic, I like to instead use the phrase “subvert or coerce the evaluator” (see http://www.mail-archive.com/agi@v2.listbox.com/msg00995.html for example).

How exactly AIXI would accomplish that would depend on various hard to predict details, but it sounds like +Laurent Orseau wouldn’t disagree with the general conclusion?

Laurent Orseau: Now we’re getting too far away in the realm of speculation. There are many things that are possible, and I don’t expect to be able to think about them all. But the agent could well do everything by itself to sustain its own life, possibly on another planet where humans have little chance to set foot, but where it could find all the resources it needs. Making a deal with humans to avoid a hazardous nuclear/EMP war with hardly predictable outcomes for both parties is probably the best option (with a scorched earth policy from humanity, the expected reward of the agent for acquiring our resources and technology may not be very high). I’m not saying it’s the way things would unravel, but it’s still one possibility to take into account beside all freaky scenarios.
Also, please avoid removing context. What I said is “There are many situations where the agent wouldn’t even care about humans[…]” and not “the agent wouldn’t even care about humans”. So I do think there are situations where the agent and humans can be face to face.

Also, it’s quite difficult to have a more precise context about the situation, since if we expect such AI to have a possibly dangerous behavior we (hopefully) will not fall into that trap. Consider that we also want to maximize our survival chances. But you can still picture some Frankensteinian situation if you like stories.

However, if you want the bottom line of my thinking, I think the main problem is not how much intelligent the agent will be (though this matters too, certainly), but how powerful it will be, in terms of resources, potential weapons, etc. As soon as the agent cannot be threatened, or forced to do things the way we like, it can freely optimize its utility function without any consideration for us, and will only consider us as tools.
This also applies to humans with non-augmented human-level intelligence.
Although not an impossible scenario, it’s not clear if this could really ever happen. So again, that’s just funny speculation.

Alexander Kruel:

However, if you want the bottom line of my thinking, I think the main problem is not how much intelligent the agent will be (though this matters too, certainly), but how powerful it will be, in terms of resources, potential weapons, etc.

Some people who are concerned with AI risks believe that a superhuman intelligence could easily acquire the necessary resources by either solving molecular nanotechnology, hacking, deceit or social engineering, without anyone noticing it.

Do you believe such a scenario to be probable?

Laurent Orseau:  Plausible, yes; probable, I don’t know.
Anyway, it doesn’t hurt to work on both security and safety, even though we don’t even have yet the beginning of a formal definition of what safety is (that’s the first thing to do before trying to solve the problem itself).

Alexander Kruel: Do you think it will be possible to work on safety, or a formal definition of it, without working on AGI at the same time?

I asked another researcher who works on AIXI and they replied,

I’d argue that further researching and extending a formal framework
like AIXI is one of the best ways to reduce the risk of AI. There’s
plenty of other ways to make progress that are far less amenable to
analysis.. those are the ones which we should really be concerned
about. Actually, it’s quite surprising that nobody who (publically)
cares about AI risk has, to the best of my knowledge, even tried to
extend the AIXI framework to incorporate some notion of

In other words, do you deem it to be possible to avoid AGI research while trying to ensure the safety of AGI?

Laurent Orseau: 

Do you think it will be possible to work on safety, or a formal definition of it, without working on AGI at the same time? In other words, do you deem it to be possible to avoid AGI research while trying to ensure the safety of AGI?

No, I don’t think so.  These are completely intertwined problems.

I’d argue that further researching and extending a formal framework like AIXI is one of the best ways to reduce the risk of AI.

I agree.

(For more by Laurent Orseau see Q&A with experts on risks from AI #4 and this Google+ thread.)

Tags: , ,

I’m by no means an expert, but I have some experience with robotics. My first job out of college was working on robots at NASA, and my undergraduate degree project was on robotic navigation. I spent my teenage years participating in FIRST Robotics, programming software bots to fight in virtual tournaments, and working on homemade underwater ROVs. And I’ve watched plenty of Robot WarsBattleBots, and Killer Robots Robogames.

If all that experience has taught me anything, it’s that the robot revolution would end quickly, because the robots would all break down or get stuck against walls. Robots never, ever work right.

What people don’t appreciate, when they picture Terminator-style automatons striding triumphantly across a mountain of human skulls, is how hard it is to keep your footing on something as unstable as a mountain of human skulls. Most humans probably couldn’t manage it, and they’ve had a lifetime of practice at walking without falling over.


In labs everywhere, experimental robots would leap up from lab benches in a murderous rage, locate the door, and—with a tremendous crash—plow into it and fall over.


Hours later, most of them would be found in nearby bathrooms, trying desperately to exterminate what they have identified as a human overlord but is actually a paper towel dispenser.

Link: what-if.xkcd.com/5/

Tags: ,

« Older entries