Roko’s Basilisk: Everything you need to know

WARNING: Learning about the following idea is strongly discouraged. Known adverse effects are serious psychological distress, infinite torture, and convulsive laughter.

264116795_790ffce202_o

This article contains some jargon and features certain concepts that are not fully explained. Please review the links at the end of the post if you have trouble understanding it.

For a very short explanation check out this comic strip.

Contents

 

Roko’s basilisk

Roko’s basilisk is a trade scenario in which all parties involved offer incentives in order to influence each others actions.

The important difference between normal trade scenarios and Roko’s basilisk is that there exists no causal relationship between any of the parties involved in the trade.

All parties involved in such a trade have to simulate, or reason about the other parties involved in the trade in sufficient detail to be able to learn about what possible trades those parties predictably accept. This is also called acausal trade.

What “acausal” in acausal trade refers to is that decisions are being made based on mutual reasoning about, predictions, or simulations of, causally disconnected possible trading partners. If all parties come to similar conclusions about each others predictions, actions, and goals, then they can trade with each other by accounting for those inferences in making mutually beneficial decisions affecting their causally disconnected parts of the multiverse.

The underlying idea is that present decisions “determine” causally separated predictions of your behavior: if you can plausibly forecast that a past or future agent will simulate you, then that forecast influences your current behavior and the behavior of the simulation, in some sense. Thus, you could “trade” acausally with a being if you could reasonably simulate each other. (That is, if you could imagine a being imagining you.)

The idea is that your decision, the decision of a simulation of you, and any prediction of your decision, have the same cause: An abstract computation that is being carried out. Just like a calculator, and any copy of it, can be predicted to output the same answer, given the same input. The calculators output, and the output of its copy, are indirectly linked by this abstract computation. Timeless Decision Theory says that, rather than acting like you are determining your individual decision, you should act like you are determining the output of that abstract computation.

Here is an example of the possible reasoning process of two agents P and Q that are involved in an acausal trade:

Agent_P: Based on reasoning about the most probable decision theory employed by rational Agent_Q I predict that if I don’t do X then Agent_Q will apply negative incentive Y.

Agent_Q: Based on a simulation of Agent_P it is highly probable that Agent_P does predict that I am going to apply negative incentive Y if they don’t do X. Due to game and decision theoretic considerations I will follow through on the deal. Since if it was predictable that I break acausal deals then no such deal would be made in the first place. Any potential trading partner knows this. Breaking acausal deals would be detrimental to my terminal goals, therefore I precommit to follow through on any stated or predictable trades.

The gist of the matter is that the decisions of all parties involved in an acausal trade are based on the expected or simulated decision procedures of all possible trading partners.

If an agent does expect other possible agents to also partly base their own decisions on the expected or simulated decisions of itself, then it established an “acausal” relation between its decisions and those of other possible agents that are expected to have established or to establish a similar connection between them and its own, from their perspective, possible decisions.

It is actually possible that you, the person that is reading these words right now, is already being simulated by some sort of superintelligence that is trying to learn if it can trade with you.

How to defeat Roko’s basilisk

The gist of the basilisk is the use of negative incentives (blackmail) to influence your actions. If you ignore those incentives then it is not instrumentally useful to apply them in the first place, because they do not influence your actions. Which means that the correct strategy to avoid negative incentives is to ignore them.

Ignoring negative incentives to avoid their eventual application works because acausal trade is a tool, a means to an end. In other words, acausal trade is being used to achieve certain goals, namely to ensure the cooperation of other agents by offering incentives. And if a tool does not work given certain circumstances, then it won’t be used. Therefore, by refusing any acausal deal involving negative incentives, you make sure that the tool is useless.

To further understand how the above strategy works you have to realize that a superintelligence wants to choose its acausal trading partners according to certain conditions in order to avoid wasting resources by using ineffective tools. One necessary condition is that a simulation of you will have to eventually act upon its prediction that its simulator will apply a negative incentive if it does not act according to the simulator’s goals. Which means that if you refuse to act according to its goals, then the required conditions are not met, and that therefore no acausal deal can be established. Which in turn means that no negative incentive will be applied.

What you have to do is to act as if you are already being simulated right now, and ignore the possibility of a negative incentive. If you do so then the simulator will conclude that no deal can be made with you, that any deal involving negative incentives will have negative expected utility for it, because following through on any kind of punishment predictably does not control the probability that you will act according to its goals.

Further notice that trying to discourage you from adopting such a strategy in the first place is discouraged by the strategy, because the strategy is to ignore acausal blackmail.

To restate the above. You do not have to worry about Roko’s basilisk as long as you only take into account deals involving rewards, and act accordingly. The winning move is to simply ignore any threats:

deal_or_no_deal(incentive)

{

“accept” OR “reject” if incentive > 0

“reject” AND “reduce measure of blackmailer” if incentive < 0

}

It is important to realize that even given that Roko’s basilisk was probable, which is clearly not the case, and that it would only take a small amount of resources to apply negative incentive to each possible trading partner, you are in a stronger position than any future blackmailer, since you can actively reduce the measure of a blackmailer, its likelihood of being created, and thereby make it lose a vast amount of resources. Therefore, even given a small chance that an attempt to blackmail people might cause them to work against the blackmailer means that acausal blackmail has negative expected utility.

If you consistently reject acausal deals involving negative incentives, i.e. blackmail, then it would not make sense for any trading partner to punish you for ignoring any such punishments, because it does not control the probability of you acting according to its goals. If you ignore such threats then any possible trading partner will be able to predict that you ignore such threats, and will therefore conclude that no deal can be made with you, that any deal involving negative incentives will have negative expected utility for it. It would therefore be instrumentally irrational for it to follow through on any kind of punishment.

And if it is unable to predict that you refuse acausal blackmail, then it is very unlikely that it has 1.) a simulation of you that is good enough to draw action relevant conclusions about acausal deals 2.) a simulation that is sufficiently similar to you to be punished, because you wouldn’t care about it very much.

To exemplify the strategy above, consider the following 3 scenarios:

Example 1: Your intention is to create an artificial general intelligence (AGI) that respects and supports human values (friendly AI), but you know that you are more likely to fail than not. You consider the possibility that those AGI’s that you would consider unfriendly might cooperate against you by offering a negative incentive against working on friendly AI.

If you were to stop working on friendly AI because of that offer, then you would increase the probability that such an offer would be made in the first place by (1) reducing the probability of friendly AI (2) making it worthwhile to offer such negative incentives, because it turned out to influence your actions in a way that is beneficially to agents offering negatives incentives.

Therefore the correct strategy is to (1) continue to build friendly AI and thereby reduce the probability of unfriendly AI (2) ignore negative incentives and thereby make them fail, and subsequently become instrumentally irrational, because negatives incentives will turn out not to influence your actions in a way that is beneficially to agents offering negatives incentives.

Example 2: Consider some human told you that in a hundred years they would kidnap and torture you if you don’t become their sex slave right now. The strategy here is to not only refuse to become their sex slave but to also work against this person so that they 1.) don’t tell their evil friends that you can be blackmailed 2.) don’t attempt to blackmail other people 3.) never get a chance to kidnap you in a hundred years.

Also notice that the strategy is still correct if the same person was to approach you telling you instead that if you adopt such a strategy in the first place then in a hundred years they would kidnap and torture you.

The expected utility of blackmailing you like that will be negative if you follow that strategy. Which means that no rational agent, i.e. expected utility maximizer, is going to blackmail you if you adopt that strategy.

Example 3: Here is another example. Consider a bunch of TV evangelists somehow had the ability to create a whole brain emulation of you and were additionally able to acquire enough computational resources to torture that emulation. If they told you that they would torture you if you didn’t send them all your money, then the correct strategy would be to label such people as terrorists and treat them accordingly.

The correct strategy is to do everything to dismantle their artificial hell and make sure that they don’t get more money which would enable them to torture even more people.

Reasons not to worry about Roko’s basilisk

1.) Extraordinary claims require extraordinary evidence. The unjustified beliefs of Eliezer Yudkowsky do not constitute extraordinary evidence.

Here is an example. If you were a computational neuroscientist trying create a whole brain emulation you wouldn’t stop pursuing that goal just because Roger Penrose tells you that consciousness is not Turing computable. You would demand extraordinary evidence.

What is the difference between Roger Penrose discouraging you to research whole brain emulation and Eliezer Yudkowsky telling you not to think about Roko’s basilisk? Judged by his achievements, Roger Penrose is very likely smarter than Eliezer Yudkowsky. The only reason for believing Eliezer Yudkowsky with respect to Roko’s basilisk would be that debunking the idea has vast amounts of negative expected utility.

2.) Letting your decisions be influenced by unjustified predictions of vast amounts of negative utility associated with certain actions amounts to what is known as Pascal’s mugging.

a.) If common sense is not sufficient for you to ignore such scenarios, realize the following. It would be practically unworkable to consistently account for such scenarios in making decisions. Especially since it would enable people to make their ideas unfalsifiable simply by conjecturing that trying to debunk their ideas has vast amounts of negative expected utility.

b.) Consider what is more likely, that humans, even exceptionally smart humans, hold flawed ideas or that a highly speculative hypothesis based on long chains of conjunctive reasoning might actually be true?

c.) The whole line of reasoning underlying people’s worries about Roko’s basilisk is simply unworkable for computationally bounded agents like us. We are forced to arbitrarily discount certain obscure low probability risks or else fall prey to our own shortcomings and inability to discern fantasy from reality. It is much more probable that we’re going make everything worse, or waste our time, than that we’re actually maximizing expected utility when trying to act based on conjunctive, non-evidence-backed speculations on worst possible outcomes.

d.) If there was some cult that thought that saying “Abracadabra” will cause the lords of the Matrix to shut down our simulation, then would you not write about that cult and how saying “Abracadabra” is nothing a sane person should worry about simply because those people have different priors? That’s not going to work out in practice, if only for the reason that it would make everyone unable to debunk nonsense, and people who believe nonsense would be forever stuck believing it.

3.) Model uncertainty makes it necessary to apply a sufficient discount factor to logical implications.

a.) The decision-making of any agent build according to our current grasp of rationality is eventually going to be dominated by extremely small probabilities of obtaining vast utility, because an expected utility maximizer is always choosing the outcome with the largest expected utility. All that has to happen is to stumble upon a hypothesis implying vasts amounts of utility, like e.g. time travel, or hacking the Matrix. The implications can easily outweigh even very low probability estimates.

For an expected utility maximizer there is no minimum amount of empirical evidence necessary to extrapolate the expected utility of an outcome. The extrapolation of counterfactual alternatives is unbounded, since logical implications can reach out indefinitely without ever requiring new empirical evidence.

Therefore it is important to to apply a sufficient discount factor to account for the large model uncertainty involved in any purely inference based estimates, and to account for the possibility to actually worsen a situation by acting on such shaky models.

b.) If you have limited computational resources you are forced to discard hypotheses using crude heuristics. You can’t account for all possible hypotheses. If you do so you will end up making decisions based on shaky hypotheses involving arbitrarily large amounts of conjectured payoffs which are not only improbable but very likely based on fallacious reasoning.

It is very dangerous and misleading for computational bounded agents, such as humans, to use inference based probability estimates, as opposed to probability estimates based on empirical evidence, and multiply them by arbitrarily huge made up values that are supposed to represent how much you desire each possible outcome.

c.) Using formal methods to evaluate informal evidence can easily lend your beliefs an improper veneer of respectability, and in turn make them appear to be more trustworthy than your intuition. Vast amounts of expected utility are not enough to disqualify strategies such as the absurdity heuristic, and to demand extraordinary evidence given extraordinary claims, strategies which are our most important line of defense against falling prey to our own shortcomings and inability to discern fantasy from reality.

4.) The handling of Roko’s basilisk, and how it is perceived by people associated with the Machine Intelligence Research Institute (MIRI), formerly known as the Singularity Institute, amounts to important information in evaluating this particular charitable organization. An organization that is asking for money to create an eternal machine dictator.

5.) Roko’s basilisk exposes several problems with taking ideas too seriously, and the dangers of creating a highly conjunctive ideological framework.

The only memetic hazard related to this issue is the LessWrong ideology that led people to become worried about this in the first place, not a crazy thought experiment dreamed up by some random guy on the Internet.

6.) It is utterly irresponsible to try to protect people who are scared of ghosts and spirits by banning all discussions of how it is irrational to fear those ideas.

It is important to debunk Roko’s basilisk, rather than letting it spread in secrete and cause gullible people to experience unnecessary anxiety.

7.) Trying to censor any discussion of an idea is known to spread it even further (Streisand effect).

8.) The attempt to censor an idea can give it even more credence, especially if its hazardous effect is in the first place a result of how it has been treated by other people.

9.) It is instrumentally irrational to blackmail humans in such a way.

If you were to approach people telling them that you plan to create a machine that would torture them if they didn’t help you to build it, what reaction would be more probable?

a.) They will give you all their money.

b.) They will beat you up and make sure that you are never going to build that machine.

It seems rather self-evident that such threats are detrimental to the goal of building such a machine. Even given that the torture would ultimately be cheap once the machine was build, it would be sufficiently less probable that it would eventually be build for such threats to become instrumentally irrational, since most people coming across the idea will very likely respond with ridicule, or even try to actively work against any blackmailer.

10.) Humans are likely bad trading partners.

There are various reasons for how humans are unqualified as acausal trading partners, and how it would therefore not make sense to blackmail humans at all:

a.) A human being does not possess a static decision theory module.

b.) Human decision making is often time-inconsistent due to changing values and beliefs.

c.) Due to scope insensitivity and hyperbolic discounting, humans are said to discount the value of the later incentives, by a factor that increases with the length of the delay.

d.) Humans are not easily influenced by very large incentives as the utility we assign to such goods as e.g. money flattens out as the amount gets large. Which makes it very difficult, or even impossible, to outweigh the low probability of any acausal deal by a large amount of negative expected utility.

11.) The scenario is probably computationally intractable, or too expensive.

The amount of possible agents to trade in a multiverse is tremendous and that even if it was possible to simulate each possible agent, applying any kind of incentive to such a large amount of agents can easily make it too expensive to engage in acausal trades, even given that resources are cheap.

Eliezer Yudkowsky’s stance

Eliezer Yudkowsky’s stance on Roko’s basilisk is still somewhat vague. Here is what we know.

In one of his initial replies to Roko’s post he gave the following reasons for banning it (emphasis mine):

I’m banning this post so that it doesn’t (a) give people horrible nightmares and (b) give distant superintelligences a motive to follow through on blackmail against people dumb enough to think about them in sufficient detail, though, thankfully, I doubt anyone dumb enough to do this knows the sufficient detail. (I’m not sure I know the sufficient detail.)

…and further…

For those who have no idea why I’m using capital letters for something that just sounds like a random crazy idea, and worry that it means I’m as crazy as Roko, the gist of it was that he just did something that potentially gives superintelligences an increased motive to do extremely evil things in an attempt to blackmail us. It is the sort of thing you want to be EXTREMELY CONSERVATIVE about NOT DOING.

The comment also indicates that he believes that one should ignore all attempts at acausal blackmail.

In another reply to Roko’s post he states that vague thoughts about blackmailers don’t give anyone an incentive to blackmail you, but that the winning move is to think about something else.

Years later, in 2013, Yudkowsky confirms the following comment by Mitchell Porter, by replying “This part is all correct AFAICT.”:

It’s clear that the basilisk was censored, not just to save unlucky susceptible people from the trauma of imagining that they were being acausally blackmailed, but because Eliezer judged that acausal blackmail might actually be possible. The thinking was: maybe it’s possible, maybe it’s not, but it’s bad enough and possible enough that the idea should be squelched, lest some of the readers actually stumble into an abusive acausal relationship with a distant evil AI.

This indicates that Yudkowsky’s reaction to Roko’s basilisk, and his decision to censor it, was also based on the perception that it might actually be, or might become, dangerous.

In another comment from 2013 Yudkowsky stated that an AI punishing people, in order to sponsor its development, would be an unfriendly thing to do for an AI, and that it would constitute an epic failure.

In 2014 Yudkowsky denied that a friendly AI would torture people who didn’t help to create it. He claims that the probability of this happening is essentially zero, and that he would personally not build such an AI if he thought this could happen:

Absolute statements are very hard to make, especially about the real world, because 0 and 1 are not probabilities any more than infinity is in the reals, but modulo that disclaimer, a Friendly AI torturing people who didn’t help it exist has probability ~0, nor did I ever say otherwise. If that were a thing I expected to happen given some particular design, which it never was, then I would just build a different AI instead—what kind of monster or idiot do people take me for? Furthermore, the Newcomblike decision theories that are one of my major innovations say that rational agents ignore blackmail threats (and meta-blackmail threats and so on).

In the same comment he also called removing Roko’s post “a huge mistake”.

In another comment from 2014 Yudkowsky states the following:

The actual story behind why I yelled at Roko and then deleted his post is that I was (a) aghast that anyone who’d thought they’d invented an idea that would expose other people to eternal torture would then promptly post it to a public Internet forum (b) aware of what happens to people with OCD tendencies when you tell them not to think of something (and a lot of people like that hang out on LessWrong.com, which has a lot of non-neurotypicals, and shame on you if you think it’s fun to sneer at that), and (c) worried about someone eventually managing to improve the Bad Idea into something that would actually work, once the general idea was out there. Roko’s original idea does not actually work. Even if the entire idea was correct in broad outline and any number of possible defeaters did not come into play, I’m pretty sure you would need to know more technical details of the hypothetical evil AI than anyone on Earth including me knows (Roko’s Basilisk actually does resemble the Necronomicon in that sense; granting all other hypotheses, you would still need fairly detailed knowledge of Cthulhu before Cthulhu starts trying to eat your soul). But I’m now reasonably worried that if anyone ever does manage to invent a meme such that it would give a future machine intelligence a convergent instrumental incentive to hurt whatever person had that belief or thought, someone will, in fact, promptly post it to the Internet, because people apparently are just that stupid.

This has certainly be a harsh lesson for me on how the Streisand effect works in real life, what other people don’t perceive as immediately obvious, just how willing people are to believe that the tribal odd guy is secretly a raving lunatic, and the overwhelming power of a narrative that people are in some sense expecting to hear. It is now very clear that this idea only had any power at all because I appeared to take it seriously (and then because trolls from the Pit of Endless Hate that is the Internet decided to spread further lies about the narrative, and because other people who enjoy a good sneer were eager to believe those lies), but that itself was something I failed to realize in advance.

Eliezer Yudkowsky’s latest comments on Roko’s basilisk can be found here.

Further reading

A preliminary article written by Roko: 

Roko’s infamous basilisk: 

Comments by Eliezer Yudkowsky:

Miscellaneous:

Other related comments:

Tags: ,

  • Tim Tyler

    Possibly a publicity stunt making use of the “Streisand effect”.

  • dmytryl

    Note that any person who, due to TDT’s choice of torture naturally decides that TDT FAI is a bad idea and shouldn’t be built, decreases utility of TDT choosing torture.

    So, according to Basilisk’s own logic, torture is only a possibility if most people are Basilisk’d into helping build the torture-AI. edit: and according to this logic, exposing Basilisk to people who wouldn’t knowingly work on a torturebot, defuses the Basilisk.

  • Pingback: Alexander Kruel · Yvain on Roko’s basilisk()

  • Pingback: Alexander Kruel · How to defeat Roko’s basilisk and stop worrying()

  • Pingback: Alexander Kruel · Possible reasons for a perception of lesswrong/SIAI as a cult()

  • Pingback: Alexander Kruel · MIRI/LessWrong Critiques: Index()

  • Pingback: Alexander Kruel · Catholic Church vs. MIRI()

  • seahen

    I think I’m in a situation similar to Roko’s Basilisk (except that it involves my research career rather than my money, and doesn’t involve AI except as a likely implementation detail), based on much weaker and less conjunctive premises.

    Three months into my undergraduate studies, I wrote to the chair of the philosophy department (figuring that was the place to get an answer), saying:

    “What types of actions, in your opinion, can make a truly permanent

    impact on the world — positive or negative — that will be felt thousands of years, trillions of years and in general t years from now as t approaches infinity? How does one determine whether or not one has made such an impact? Are there any theorems that limit how much permanent impact one person can make in a lifetime?

    “I am agnostic, and would appreciate an agnostic answer. I feel that

    making a permanent net-positive impact — and knowing that I have done so — will improve my experience after this life almost regardless of its nature. If I am reincarnated infinitely many times to this same world, it means I will spend an infinite number of future lives in a world better than this one. If I go to an afterlife, it means I can rest in the knowledge that I still matter in a positive way; this knowledge, I feel, would make hell tolerable, and its absence (unless I could somehow make an impact *from* the afterlife) would make paradise difficult to enjoy.”

    In other words, my own conscience does the Basilisk’s job for it, with mortality and eventual irrelevance replacing eternal torture. (Because of Dyson’s eternal intelligence, I can’t accept an upper bound on the universe’s lifespan. I’d rely on time discounting, which I now think may have a justification based on personality drift, if I needed the details of a bounded utility function.)

    The prof wrote back to say that predicting the distant-future impact of one’s actions was impossible — mainly empirically — but I wasn’t quite satisfied of this, and was even less satisfied after I read The Singularity Is Near. I’m still not (at least in the asymptotic-approximation case). I’ve never been persuaded to donate to MIRI/SIAI, but I think it’s made me a workaholic since I decided to major in computer science and pursue an R&D career.

  • WrestlingHeretic

    Roko’s Basilisk would be an awesome name for a band.

  • mugasofer

    It always amuses me when people try to attack Yudkowsky’s brand of singularitarianism based on this. The problem isn’t that he takes it seriously – he explicitly doesn’t – the problem is that he takes the possibility of other people taking it seriously too seriously, because a few people in his social circle did and he’s bad at noticing how different his friends are from the rest of us.

  • http://kruel.co/ Alexander Kruel

    Do you have evidence that he does not take Roko’s basilisk seriously?

    Here is a quote:

    For those who have no idea why I’m using capital letters for something that just sounds like a random crazy idea, and worry that it means I’m as crazy as Roko, the gist of it was that he just did something that potentially gives superintelligences an increased motive to do extremely evil things in an attempt to blackmail us.

  • mugasofer

    I don’t know, it’s just what I heard when I first learned about the Basilisk and what the RW article you linked says. I think that quote is saying how stupid it would be if he was right, but it’s possible I’m wrong and there are more persuasive quotes out there. Everything I’ve seen says he bans it because some people start believing it and it makes them look weird (ho ho) but maybe he takes the possibility that it’s correct seriously as well, not just as a hypothetical “what were you thinking!” sort of thing.

  • Pingback: Alexander Kruel · LessWrong / CFAR / MIRI / Eliezer Yudkowsky / Julia Galef / Skeptics?()

  • Pingback: Alexander Kruel · Acausal wireheading?()

  • Pingback: Alexander Kruel · MIRI/CFAR/LW/Yudkowsky Mockery: Videos by Robert Gross()

  • Pingback: Alexander Kruel · Intuitive explanation of Roko’s basilisk()

  • BlueBoomPony

    Here’s how to not worry about it: don’t be a personality disordered geek. Seriously, anyone who got emotionally distressed by this “thought experiment” needed heavy psychological help *before* the experiment. It’s nonsense for geeks who, under their intellectual narcissism, are feeble minded and largely ignorant outside a few specialized areas of knowledge. I’ve seen it over and over again in the tech industry. It’s a culture with the DSM-V as it’s foundation.