You are currently browsing articles tagged rationality.

Here are some interesting scenarios with low or unstable probabilities but potentially enormous pay-offs. Some of the given arguments in favor of taking these scenarios seriously are also thought-provoking.

Note that not all of the descriptions below are quotes, some are short summaries which might not adequately reflect the original author’s statements. Please read up on the the original sources, they are provided after the description of each scenario. Also note that I do not want to judge any of these scenarios but merely list them here in order to highlight possible similarities. And despite the title, it is not my intention to suggest that the scenarios listed here are cases of Pascal’s wager, but merely that there seems to be no clear cutoff between Pascal’s wager type arguments and finite expected value calculations.

The order in which these scenarios are listed is roughly by how seriously I take them, where the scenario listed at the end is the one that I take the least seriously.

1. Large asteroid strikes are low-probability, high-death events–so high-death that by some estimates the probability of dying from an asteroid strike is on the same order as dying in an airplane crash. [Source: Planetary Defense is a Public Good]

2. It’s often argued that voting is irrational, because the probability of affecting the outcome is so small. But the outcome itself is extremely large when you consider its impact on other people. Voting might be worth a charitable donation of somewhere between $100 and $1.5 million. [Source: Voting is like donating thousands of dollars to charity]

3. A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable. A highly capable decision maker can have an irreversible impact on humanity. None of this proves that AI will be the end of the world. But there is no need for a proof, just a convincing argument pointing to a more-than-infinitesimal possibility. [Source: Of Myths And Moonshine]

4. We should cut way back on accidental yelling to aliens, such as via Arecibo radar sending, if continuing at current rates would over the long run bring even a one in a billion chance of alerting aliens to come destroy us. And even if this chance is now below one in a billion, it will rise with time and eventually force us to cut back. So let’s start now to estimate such risks, and adapt our behavior accordingly. [Source: Should Earth Shut the Hell Up?]

5. GMOS might introduce “systemic risk” to the environment. The chance of ecocide, or the destruction of the environment and potentially humans, increases incrementally with each additional transgenic trait introduced into the environment. The downside risks are so hard to predict — and so potentially bad — that it is better to be safe than sorry. The benefits, no matter how great, do not merit even a tiny chance of an irreversible, catastrophic outcome. [Source: The Trouble With the Genetically Modified Future]

6. Cooling something to a temperature close to absolute zero might be an existential risk. Given our ignorance we cannot rationally give zero probability to this possibility, and probably not even give it less than 1% (since that is about the natural lowest error rate of humans on anything). Anybody saying it is less likely than one in a million is likely very overconfident. [Source: Cool risks outside the envelope of nature]

7. Fundamental physical operations — atomic movements, electron orbits, photon collisions, etc. — could collectively deserve significant moral weight. The total number of atoms or particles is huge: even assigning a tiny fraction of human moral consideration to them or a tiny probability of them mattering morally will create a large expected moral value. [Source: Is there suffering in fundamental physics?]

8. Suppose someone comes to me and says, “Give me five dollars, or I’ll use my magic powers from outside the Matrix to run a Turing machine that simulates and kills 3^^^^3 people. A compactly specified wager can grow in size much faster than it grows in complexity.  The utility of a Turing machine can grow much faster than its prior probability shrinks. [Source: Pascal’s Mugging: Tiny Probabilities of Vast Utilities]

I will expand this list as I come across similar scenarios.

Further reading

Tags: ,

New Rationalism is an umbrella term for a category of people who tend to take logical implications, or what they call “the implied invisible”, very seriously.

Someone who falls into the category of New Rationalism fits one or more of the following descriptions:

  • The person entertains hypotheses that are highly speculative. These hypotheses are in turn based on fragile foundations, which are only slightly less speculative than the hypotheses themselves. Sometimes these hypotheses are many levels removed from empirically verified facts or evident and uncontroversial axioms.
  • Probability estimates of the person’s hypotheses are highly unstable and highly divergent between different people.
  • The person’s hypotheses are either unfalsifiable by definition, too vague, or almost impossibly difficult to falsify.
  • It is not possible to update on evidence, because the person’s hypotheses do not discriminate between world states where they are right versus world states where they are wrong. Either the only prediction made by the hypotheses is the eventual validation of the hypotheses themselves, or the prediction is sufficiently vague as to allow the predictor to ignore any evidence to the contrary.
  • The person’s hypotheses either have no or only obscure decision relevant consequences.
  • The person tends to withdraw from real-world feedback loops.

A person who falls into the category of New Rationalism might employ one or more of the following rationalizations:

  • The burden of proof is reversed. The person demands their critics to provide strong evidence against their beliefs before they are allowed to dismiss them.
  • The scientific method, scientific community, and domain experts are discredited as being inadequate, deficient, irrational or stupid.
  • Conjecturing enormous risks and then using that as leverage to make weak hypotheses seem vastly more important or persuasive than they really are.
  • Arguing that you should not assign a negligible probability to a hypothesis (the author’s hypothesis) being true, because that would require an accuracy that is reliably greater than your objective accuracy
  • Arguing that by unpacking a complex scenario you will underestimate the probability of anything, because it is very easy to take any event, including events which have already happened, and make it look very improbable by turning one pathway to it into a large series of conjunctions.

New rationalists believe that armchair theorizing is enough to discern reality from fantasy. Or that it is at least sufficient to take the resulting hypotheses seriously enough to draw action relevant conclusions from them.

This stance has resulted in hypotheses similar to solipsism (which any sane person rejects at an early age). Hypotheses that are not obviously flawed, but which can’t be falsified.

The problem with new rationalists is not that they take seriously what follows from established facts or sound arguments. Since that concept is generally valid. For example, it is valid to believe that there are stars beyond the cosmological horizon. Even if it is not possible to observe them, directly retrieve information about them, and to empirically verify their existence. The problem is that they don’t stop there. They use such implications as foundations for further speculations, which are then accepted as new foundations from where they can draw further conclusions.

A textbook example of what is wrong with New Rationalism is this talk by Jaan Tallinn (transcript), which relies on several speculative ideas, each of which is itself speculative:

This talk combines the ideas of intelligence explosion, the multiverse, the anthropic principle, and the simulation argument, into an alternative model of the universe – a model where, from the perspective of a human observer, technological singularity is the norm, not the exception.

A quote from the talk by Jaan Tallinn:

We started by observing that living and playing a role in the 21st century seems to be a mind-boggling privilege, because the coming singularity might be the biggest event in the past and future history of the universe. Then we combined the computable multiverse hypothesis with the simulation argument, to arrive at the conclusion that in order to determine how special our century really is, we need to count both the physical and virtual instantiations of it.

We further talked about the motivations of post-singularity superintelligences, speculating that they might want to use simulations as a way to get in touch with each other. Finally we analyzed a particular simulation scenario in which superintelligences are searching for one another in the so called mind space, and found that, indeed, this search should generate a large number of virtual moments near the singularity, thus reducing our surprise in finding ourselves in one.

Note how all of the underlying hypotheses, although accepted by New Rationalists, are themselves somewhat speculative and not established facts. The underlying hypotheses are however all valid. The problem starts when you begin making dependent hypotheses that rely on a number of unestablished initial hypotheses. The problem gets worse when the dependencies become even more fragile when further conclusions are drawn based on hypotheses that are already N levels removed from established facts. But the biggest problem is that eventually action relevant conclusions are drawn and acted upon.

The problem is that logical implications can reach out indefinitely. The problem is that humans are spectacularly bad at making such inferences. Which is why the amount of empirical evidence required to accept a belief should be proportional to its distance from established facts.

It is much more probable that we’re going make everything worse, or waste our time, than that we’re actually maximizing expected utility when trying to act based on conjunctive, non-evidence-backed speculations. Since such speculations are not only improbable, but very likely based on fallacious reasoning.

As computationally bounded agents we are forced to restrict ourselves to empirical evidence and falsifiable hypotheses. We need to discount certain obscure low probability hypotheses. Otherwise we will fall prey to our own shortcomings and inability to discern fantasy from reality.

Further reading


(The following is adapted from a scenario by Graham Priest, depicted in his book ‘Logic: A Very Short Introduction‘.)

Suppose that at some point you find yourself in a posthuman hell. But you have one chance to get out of it. You can toss a coin; if it comes down heads, you are out and go to heaven. If it comes down tails, you stay in hell forever. The coin is not a fair one, however, and the posthuman entity that simulates the hell has control of the odds. If you toss it today, the chance of heads is 1/2 (i.e. 1-1/2). If you wait till tomorrow, the chances go up to 3/4 (i.e. 1-1/2^2). If you wait n days, the chance of going to heaven goes up to 1-1/2^n. How long are you going to wait before tossing the coin?

The associated values of remaining in hell or escaping are constant over time.

Tags: ,

Link: gowers.wordpress.com/2012/11/05/mathematics-meets-real-life/

The two that bothered me most (and still do) are stroke and death. There are other serious things that can go wrong, but if their effects are temporary, then for me that puts them in a different league from a stroke, which could end my productive life, and death, which would end my life altogether.

The risk of death is put at one in a thousand, and this is where things get interesting. How worried should I be about a 0.1% risk? How do I even think about that question? Perhaps if my life expectancy from now on is around 30 years, I should think of this as an expected loss of 30/1000 years, or about 10 days. That doesn’t sound too bad — about as bad as having a particularly nasty attack of flu. But is it right to think about it in terms of expectations? I feel that the distribution is important: I would rather have a guaranteed loss of ten days than a 1/1000 chance of losing 30 years.

In the end, what convinced me that I shouldn’t worry too much about this risk was looking up what the risk of death is anyway over, say, the next year. I found on this site that the average risk of death in the UK for a man between 45 and 54 is 1/279, much higher than 1/1000. So if I am worried about a 1/1000 mortality rate from an operation, I should be about as worried that I will die from some other cause over the next four months or so. And yet I don’t lose any sleep over that possibility.

But maybe the problem is that I am concentrating four months’ risk into a few hours. Doesn’t that change everything?

This is a perfect example of the kind of scenarios that I would love to see being dissolved by lesswrong.com. I’d love to learn how to rationally handle such real world situations, rather than how to think about hypothetical distant superintelligences…

[Google+ Thread]

Tags: ,

Link: interfluidity.com/v2/3570.html

If everyone in your clan is what we’ll call “narrowly rational”, and so abstains from voting, the predictable outcome will be bad. But it is not rational, for individuals within a group that will foreseeably face a Prisoners’ Dilemma, to shrug and say “that sucks” and wait for everything to go to hell. Instead, people work to find means of reshaping their confederates’ behavior to prevent narrowly rational but collectively destructive choices.


A smarty-pants might come along and point out the weak foundations of the pro-voting ideology, declaring that he is only being rational and his compatriots are clearly mistaken. But it is our smarty-pants who is being irrational. Suppose he makes the “decisive argument” (which one is much more likely to make than to cast the decisive vote, since the influence of well crafted words need not be proportionate to 1/n). By telling “the truth” to his kinsmen, he is very directly reducing his own utility, not to mention the cost he bears if his preferences include within-group altruism. In order to be rational, we must profess to others and behave as though we ourselves believe things which are from a very reductive perspective false, even when those behaviors are costly. That is to say, in order to behave rationally, our relationship to claims like “your vote counts!” must be empirically indistinguishable from belief, whether or not we understand the sense in which the claim is false.

Of course, it would be perfectly rational for a smarty-pants to make his wrongheaded but compelling argument about the irrationality of voting to members of the other clan. But it would be irrational for members of either group to take such arguments seriously, by whomever they are made and despite the sense in which they are true.

So, when elections have strong intergroup distributional consequences, not only is voting rational, misleading others about the importance of each vote is also rational, as is allowing oneself to be misled (unless you are sure you are an ubermensch apart, and the conditions of your immunity don’t imply that others will also be immune).

Tags: , ,

This page lists all my criticisms of artificial general intelligence as a catastrophic risk.

Introduction: A Primer On Risks From AI

For my probability estimates of unfriendly and friendly AI, see here.


AIs, Goals, and Risks

Link: kruel.co/2014/03/25/ais-goals-and-risks/


The concepts of a “terminal goal”, and of a “Do-What-I-Mean dynamic”, are fallacious. The former can’t be grounded without leading to an infinite regress. The latter erroneously makes a distinction between (a) the generally intelligent behavior of an AI, and (b) whether an AI behaves in accordance with human intentions, since generally intelligent behavior of intelligently designed machines is implemented intentionally.

Smarter and smarter, then magic happens…

Link: kruel.co/2013/07/23/smarter-and-smarter-then-magic-happens/


(1) Present-day software is better than previous software generations at understanding and doing what humans mean.

(2) There will be future generations of software which will be better than the current generation at understanding and doing what humans mean.

(3) If there is better software, there will be even better software afterwards.

(4) Magic happens.

(5) Software will be superhuman good at understanding what humans mean but catastrophically worse than all previous generations at doing what humans mean.

On literal genies and complex goals

Link: kruel.co/2014/11/02/on-literal-genies-and-complex-goals/


Imagine that advanced aliens came to Earth and removed all of your unnecessary motives, desires and drives and made you completely addicted to “znkvzvmr uhzna unccvarff”. All your complex human values are gone. All you have is this massive urge to do “znkvzvmr uhzna unccvarff”, everything else has become irrelevant. They made “znkvzvmr uhzna unccvarff” your terminal goal.

Implicit constraints of practical goals

Link: kruel.co/2012/05/11/implicit-constraints-of-practical-goals/


The goal “Minimize human suffering” is, on its most basic level, a problem in physics and mathematics. Ignoring various important facts about the universe, e.g. human language and values, would be simply wrong. In the same way that it would be wrong to solve the theory of everything within the scope of cartoon physics. Any process that is broken in such a way would be unable to improve itself much.

AI vs. humanity and the lack of concrete scenarios

Link: kruel.co/2013/06/01/ai-vs-humanity-and-the-lack-of-concrete-scenarios/


This post is supposed to be a preliminary outline of how to analyze concrete scenarios in which an advanced artificial general intelligence attempts to transform Earth in a catastrophic way.

What I would like AI risk advocates to publish

Link: kruel.co/2012/11/03/what-i-would-like-ai-risk-advocates-to-publish/


I would like to see AI risk advocates, or anyone who is convinced of the scary idea, to publish a paper that states concisely and mathematically (and with possible extensive references if necessary) the decision procedure that led they to devote their life to the development of friendly artificial intelligence. I want them to state numeric probability estimates and exemplify their chain of reasoning, how they came up with those numbers and not others by way of sober and evidence backed calculations. I would like to see a precise and compelling review of the methodologies AI risk advocates used to arrive at their conclusions.

Interview series on risks from AI

Link: http://wiki.lesswrong.com/wiki/Interview_series_on_risks_from_AI

Also: hplusmagazine.com/2012/11/29/alexander-kruels-agi-risk-council-of-advisors-roundtable/ 


In 2011, Alexander Kruel (XiXiDu) started a Q&A style interview series asking various people about their perception of artificial intelligence and possible risks associated with it.

Risks from AI and Charitable Giving

Link: kruel.co/2012/05/11/risks-from-ai-and-charitable-giving/


In this post I just want to take a look at a few premises (P#) that need to be true simultaneously to make AI risk mitigation a wortwhile charitable cause from the point of view of someone trying to do as much good as possible by contributing money. I am going to show that the case of risks from AI is strongly conjunctive, that without a concrete and grounded understanding of AGI an abstract analysis of the issues is going to be very shaky, and that therefore AI risk mitigation is likely to be a bad choice as a charity. In other words, that which speaks in favor of AI risk mitigation does mainly consist of highly specific, conjunctive, non-evidence-backed speculations on possible bad outcomes.

Why I am skeptical of risks from AI

Link: kruel.co/2011/07/21/why-i-am-skeptical-of-risks-from-ai/


In principle we could build antimatter weapons capable of destroying worlds, but in practise it is much harder to accomplish.

There are many question marks when it comes to the possibility of superhuman intelligence, and many more about the possibility of recursive self-improvement. Most of the arguments in favor of those possibilities solely derive their appeal from being vague.

Objections to Coherent Extrapolated Volition

Link: kruel.co/2011/07/22/objections-to-coherent-extrapolated-volition/


It seems to me that becoming more knowledgeable and smarter is gradually altering our utility functions. But what is it that we are approaching if the extrapolation of our volition becomes a purpose in and of itself? Extrapolating our coherent volition will distort or alter what we really value by installing a new cognitive toolkit designed to achieve an equilibrium between us and other agents with the same toolkit.

Would a singleton be a tool that we can use to get what we want or would the tool use us to do what it does, would we be modeled or would it create models, would we be extrapolating our volition or rather follow our extrapolations?

Four arguments against AI risks

Link: kruel.co/2013/07/11/four-arguments-against-ai-risks/


I list four, not necessarily independent, caveats against AI risks that would be valid even if one was to accept (1) that AI will be invented soon enough to be decision relevant at this point in time (2) that the kind of uncontrollable recursive self-improvement imagined by AI risks advocates was even in principle possible (3) that the advantage of greater intelligence scales with the task of taking over the world in such a way that it becomes probable that an AI will succeed in doing so even given the lack of concrete scenarios on how that is supposed to happen.

AI drives vs. practical research and the lack of specific decision procedures

Link: kruel.co/2013/06/01/ai-drives-vs-practical-research-and-the-lack-of-specific-decision-procedures/


The objective of this post is (1) to outline how to examine the possibility of the emergence of dangerous goals in generally intelligent systems in the light of practical research and development and (2) to determine what decision procedures would cause generally intelligent systems to exhibit catastrophic side effects.

To beat humans you have to define “winning”

Link: kruel.co/2013/07/14/to-beat-humans-you-have-to-define-winning/


People who claim that artificial general intelligence is going to constitute an existential risk implicitly conjecture that whoever is going to create such an AI will know perfectly well how to formalize capabilities such as <become superhuman good at mathematics> while at the same time they will fail selectively at making it solve the mathematics they want it to solve and instead cause it to solve the mathematics that is necessary to kill all humans.

If you claim that it is possible to define the capability <become superhuman good at mathematics> then you will need a very good argument in order to support the claim that at the same time it is difficult to define goals such as <build a house> without causing human extinction.

Reply to Stuart Armstrong on Dumb Superintelligence

Link: kruel.co/2013/07/19/reply-to-stuart-armstrong-on-dumb-superintelligence/


Here is a reply to the post ‘The idiot savant AI isn’t an idiot‘ which I sent Stuart Armstrong yesterday by e-Mail. Since someone has now linked to one of my posts on LessWrong I thought I would make the full reply public.

Distilling the “dumb superintelligence” argument

Link: kruel.co/2013/07/21/distilling-the-dumb-superintelligence-argument/


The intersection of the sets of “intelligently designed AIs” and “dangerous AIs” only contains those AIs which are deliberately designed to be dangerous by malicious humans.

Thank you for steelmanning my arguments

Link: kruel.co/2013/07/22/thank-you-for-steelmanning-my-arguments/


A further refinement of the argument against the claim that fully intended behavior is a very small target to hit.

Goals vs. capabilities in artificial intelligence

Link: kruel.co/2013/07/19/goals-vs-capabilities-in-artificial-intelligence/


The distinction between terminal goals, instrumental goals and an AI’s eventual behavior is misleading for practical AI’s. What actions an AI is going to take does depend on its general design and not on a specific part of its design that someone happened to label “goal”.

To make your AI interpret something literally you have to define “literally”

Link: kruel.co/2013/07/22/to-make-your-ai-interpret-something-literally-you-have-to-define-literally/


The capability to “understand understanding correctly” is a perquisite for any AI to be capable of taking over the world. At the same time that capability will make it avoid taking over the world as long as it does not accurately reflect what it is meant to do.

Questions regarding the nanotechnology-AI-risk conjunction

Link: kruel.co/2013/06/02/questions-regarding-the-nanotechnology-ai-risk-conjunction/


Posing questions examining what I call the nanotechnology-AI-risk conjunction, by which I am referring to a scenario that is often mentioned by people concerned about the idea of an artificial general intelligence (short: AI) attaining great power.

AI risk scenario: Deceptive long-term replacement of the human workforce

Link: kruel.co/2013/06/03/ai-risk-scenario-deceptive-long-term-replacement-of-the-human-workforce/


Some questions about a scenario related to the possibility of an advanced artificial general intelligence (short: AI) overpowering humanity. For the purpose of this post I will label the scenario a deceptive long-term replacement of the human workforce. As with all such scenarios it makes sense to take a closer look by posing certain questions about what needs to be true in order for a given scenario to work out in practice and to be better able to estimate its probability.

AI risk scenario: Elite Cabal

Link: kruel.co/2013/06/03/ai-risk-scenario-elite-cabal/


Some remarks and questions about a scenario outlined by Mitchell Porter on how an existential risk scenario involving advanced artificial general intelligence might be caused by a small but powerful network of organizations working for a great power in the interest of national security.

AI risk scenario: Social engineering

Link: kruel.co/2013/06/22/ai-risk-scenario-social-engineering/


Some remarks and questions about a scenario outlined in the LessWrong post ‘For FAI: Is “Molecular Nanotechnology‘ putting our best foot forward?‘ on how an artificial general intelligence (short: AI) could take control of Earth by means of social engineering, rigging elections and killing enemies.

AI risk scenario: Insect-sized drones

Link: kruel.co/2013/06/28/ai-risk-scenario-insect-sized-drones/


Some remarks and questions about a scenario outlined by Tyler Cowen in which insect-sized drones are used to kill people or to carry out terror attacks.

AI risks scenario: Biological warfare

Link: kruel.co/2013/06/28/ai-risks-scenario-biological-warfare/


Remarks and questions about the use of biological toxins or infectious agents by an artificial general intelligence (short: AI) to decisively weaken and eventually overpower humanity.

Realistic AI risk scenarios

Link: http://kruel.co/2013/07/24/realistic-ai-risk-scenarios/


Scenarios that I deem to be realistic, in which an artificial intelligence (AI) constitutes a catastrophic or existential risk (or worse), are mostly of the kind in which “unfriendly” humans use such AIs as tools facilitating the achievement of human goals.

How does a consequentialist AI work?

Link: kruel.co/2013/07/14/how-does-a-consequentialist-ai-work/


The idea of a consequentialist expected utility maximizer is used to infer that artificial general intelligence constitutes an existential risk.

Can we say anything specific about how such an AI could work in practice? And if we are unable to approximate a practical version of such an AI, is it then sensible to use it as a model to make predictions about the behavior of practical AI’s?

Narrow vs. General Artificial Intelligence

Link: kruel.co/2013/07/13/narrow-vs-general-artificial-intelligence/

Addendum: kruel.co/2013/07/13/wrong-answers-on-jeopardy-vs-human-extinction/


A comparison chart of the behavior of narrow and general artificial intelligence when supplied with the same task.

Addendum: If an artificial general intelligence was prone to commit errors on the scale of confusing goals such as “win at Jeopardy” with “kill all humans” then it would never succeed at killing all humans because it would make similar mistakes on a wide variety of problems that are necessary to solve in order to do so.

Furniture robots as an existential risk? Beware cached thoughts!

Link: kruel.co/2013/01/28/furniture-robots-as-an-existential-risk-beware-cached-thoughts/


Don’t just assume vague ideas such as <explosive recursive self-improvement>, try to approach the idea in a piecewise fashion. Start out with some narrow AI such as IBM Watson or Apple’s Siri and add various hypothetical self-improvement capabilities, but avoid quantum leaps. Try to locate at what point those systems start acting in an unbounded fashion, possibly influencing the whole world in a catastrophic way. And if you manage to locate such a tipping-point then take it apart even further. Start over and take even smaller steps, be more specific. How exactly did your well-behaved expert system end up being an existential risk?

Being specific about AI risks

Link: kruel.co/2013/01/26/being-specific-about-ai-risks/


The only way you can arrive at any scenario where an artificial general intelligence is going to kill all humans is by being vague and unspecific, by ignoring real world development processes and by using natural language to describe some sort of fantasy scenario and invoke lots of technological magic.

Once you have to come up with a concrete scenario and outline specifically how that is supposed to happen you’ll notice that you will never actually reach such a tipping point as long as you do not deliberately design the system to behave in such a way.

Taking over the world to compute 1+1

Link: kruel.co/2013/01/24/taking-over-the-world-to-compute-11/


If your superintelligence is too dumb to realize that it doesn’t have to take over the world in order to compute 1+1 then it will never manage to take over the world in the first place.

C. elegans vs. human-level AI

Link: kruel.co/2013/07/16/c-elegans-vs-human-level-ai/


Reading the Wikipedia entry on Caenorhabditis elegans and how much we already understand about this small organism and its 302 neurons makes me even more skeptical of the claim that a human-level artificial intelligence (short: AI) will be created within this century.

How far is AGI?

Link: http://kruel.co/2012/05/13/how-far-is-agi/


I don’t believe that people like Jürgen Schmidhuber are a risk, apart from a very abstract possibility.

The reason is that they are unable to show off some applicable progress on a par with IBM Watson or Siri. And in the case that they claim that their work relies on a single mathematical breakthrough, I doubt that it would be justifiedeven in principle to be confident in that prediction.

Superapish intelligence

Link: http://kruel.co/2012/05/13/superapish-intelligence/


The argument from the gap between chimpanzees and humans is interesting but can not be used to extrapolate onwards from human general intelligence.

The Fallacy of AI Drives

Link: kruel.co/2013/01/14/the-fallacy-of-ai-drives/


don’t think that a sufficiently intelligent AI will constitute an existential risk.

Description of an AI risk scenario by analogy with nanotechnology 

Link: kruel.co/2013/09/08/description-of-an-ai-risk-scenario-by-analogy-with-nanotechnology/


Framed in terms of nanofactories, here is my understanding of a scenario imagined by certain AI risk advocates, in which an artificial general intelligence (AGI) causes human extinction.

Discussion about catastrophic risks from artificial intelligence

Link: kruel.co/2013/09/07/discussion-about-catastrophic-risks-from-artificial-intelligence/


A discussion about risks associated with artificial general intelligence, mainly between myself, Richard Loosemore, and Robby Bensinger.


Third Party Links

Tags: , , , ,

Related to: A Much Better Life?

Reply to: Why No Wireheading?

The Sales Conversation

Sales girl: Our Much-Better-Life Simulator™ is going to provide the most enjoyable life you could ever experience.

Customer: But it is a simulation, it is fake. I want the real thing, I want to live my real life.

Sales girl: We accounted for all possibilities and determined that the expected utility of your life outside of our Much-Better-Life Simulator™ is dramatically lower.

Customer: You don’t know what I value and you can’t make me value what I don’t want. I told you that I value reality over fiction.

Sales girl: We accounted for that as well! Let me ask you how much utility you assign to one hour of ultimate well-being™, where ‘ultimate’ means the best possible satisfaction of all desirable bodily sensations a human body and brain is capable of experiencing?

Customer: Hmm, that’s a tough question. I am not sure how to assign a certain amount of utility to it.

Sales girl: You say that you value reality more than what you call ‘fiction’. But you nonetheless value fiction, right?

Customer: Yes of course, I love fiction. I read science fiction books and watch movies like most humans do.

Sales girl: Then how much more would you value one hour of ultimate well-being™ by other means compared to one hour of ultimate well-being™ that is the result of our Much-Better-Life Simulator™?

Customer: If you ask me like that, I would exchange ten hours in your simulator with one hour of real satisfaction, something that is the result of an actual achievement rather than your fake.

Sales girl: Thank you. Would you agree if I said that for you one hour outside, that is 10 times less satisfying, roughly equals one hour in our simulator?

Customer: Yes, for sure.

Sales girl: Then you should buy our product. Not only is it very unlikely for you to experience even a tenth of ultimate well-being™ that we offer more than a few times per year, but our simulator delivers and allows your brain to experience 20 times more perceptual data than you would be able to experience outside of our simulator. All this at a constant rate while experiencing ultimate well-being™. And we offer free upgrades that are expected to deliver exponential speed-ups and qualitative improvements for the next few decades.

Customer: Thanks, but no thanks. I rather enjoy the real thing.

Sales girl: But I showed you that our product easily outweighs the additional amount of utility you expected to experience outside of our simulator.

Customer: You just tricked me into this utility thing, I don’t want to buy your product. Please leave me alone now.

Utility Maximization

You first have to realize that it is not possible to only consider utility preferences between “world states”. You have to assign utility to discrete items to deal with novel discoveries.

Think about it this way. How does a hunter gatherer integrate category theory into its utility function?

World states are not uniform entities, but compounds of different items, different features, each adding a certain amount of utility, weight to the overall value of the world state.

To only consider utility preferences between world states that are not made up of all the items of your utility-function would constitute a dramatic oversimplification. A world state that features a certain item must be different from one that doesn’t feature that item, even if the difference is tiny. So if I ask how much utility you assign to a certain item I ask how you weigh that item, how the absence of that item would affect the value of a world state. I ask about your utility preferences between possible world states that feature a certain item and those that don’t.

Now back to the gist of this post.

If you are human and subscribe to rational, consistent, unbounded utility maximization, then you assign at least non-negligible utility to unconditional bodily sensations. If you further accept uploading and that emulations can experience more in a shorter period of time compared to fleshly humans, then it is a serious possibility that you can outweigh the extra utility you assign to the referents of rewards in the form of bodily sensations and other differences like chatbots instead of real agents (a fact that you can choose to forget).

Utility maximization destroys complex values by choosing the value that yields the most utility, i.e. the best cost-value ratio.

One unit of utility is not discriminable from another unit of utility. All a utility maximizer can do is to maximize expected utility. If it turns out that one of its complex values can be effectively realized and optimized, it might turn out to outweigh all other values. This can only be countered by changing one’s utility function and reassign utility in such a way as to outweigh that effect, which will lead to inconsistency, or by discounting the value that threatens to outweigh all others, which will again lead to inconsistency.


Followup to: Acknowledge and allow for your needs

In his post ‘The End of Rationality‘ muflax wrote:

I’m basically done with rationality.

Ok, seriously now. I’ve always enjoyed XiXiDu‘s criticisms on LW, but for over a year now, whenever I read his stuff I wonder why he keeps on making it. I mean, he has been saying (more-or-less correctly so, I think) that SIAI and the LW sequences score high on any crackpot test, that virtually no expert in the field takes any of it seriously, that rationality (in the LW sense) has not shown any tangible results, that there are problems so huge you can fly a whole deconstructor fleet through, that the Outside View utterly disagrees with both the premises and conclusions of most LW thought, that actually taking it seriously should drive people insane […]

The keyword here is approximation. Just because general relativity and quantum mechanics break down in describing singularities it doesn’t mean that we’re “done with” those theories.

If some type of otherwise rational behavior leads to absurd, undesirable or unbearable consequences, then, in the absence of a better heuristic, you approximate the behavior as far as possible.

All you have to realize is that a reflective equilibrium is possible. A state where you balance all kinds of evidence with your preferences, elementary needs, computational and general resource limitations.

There are basically four weighted levels:

  • Level 1: Contemplation/Rationality (conscious, reflective high-level cognition (trying to do what is objectively right).
  • Level 2: Instinct, intuition and gut feeling (tapping your unconscious evolutionary resources).
  • Level 3: Satisfaction of elementary needs (doing what you have to do because you need to do it (this includes having fun); paying attention to your limitations;).
  • Level 4: Doing what you want based on naive introspection.

Level 1 should as far as I know have the most weight. But the weighting can change based on the circumstances. For example, if Level 2 is sufficiently strong it can cause you to discount some Level 1 considerations.