How to defeat Roko’s basilisk and stop worrying

This post contains some jargon and features certain concepts that are not fully explained. Please review this post if you have trouble understanding it.



How to defeat Roko’s basilisk

The gist of the basilisk is the use of negative incentives (blackmail) to influence your actions. If you ignore those incentives then it is not instrumentally useful to apply them in the first place because they do not influence your actions. Which means that the correct strategy to avoid negative incentives is to ignore them.

Ignoring negative incentives to avoid their eventual application works because acausal trade is a tool, a means to an end. In other words, acausal trade is being used to achieve certain goals, namely to ensure the cooperation of other agents by offering incentives. And if a tool does not work given certain circumstances then it won’t be used. Therefore by refusing any acausal deal involving negative incentives you make sure that the tool is useless.

To further understand how the above strategy works you have to realize that a superintelligence wants to choose its acausal trading partners according to certain conditions in order to avoid wasting resources by using ineffective tools. One necessary condition is that a simulation of you will have to eventually act upon its prediction that its simulator will apply a negative incentive if it does not act according to the simulator’s goals. Which means that if you refuse to act according to its goals then the required conditions are not met and that therefore no acausal deal can be established. Which in turn means that no negative incentive will be applied.

What you have to do is to act as if you are already being simulated right now and ignore the possibility of a negative incentive. If you do so then the simulator will conclude that no deal can be made with you, that any deal involving negative incentives will have negative expected utility for it because following through on any kind of punishment predictably does not control the probability that you will act according to its goals.

Further notice that trying to discourage you from adopting such a strategy in the first place is discouraged by the strategy because the strategy is to ignore acausal blackmail.

To restate the above. You do not have to worry about Roko’s basilisk as long as you only take into account deals involving rewards and act accordingly. The winning move is to simply ignore any threats:



“accept” OR “reject” if incentive > 0

“reject” AND “reduce measure of blackmailer” if incentive < 0


It is important to realize that even given that Roko’s basilisk was probable, which is clearly not the case, and that it would only take a small amount of resources to apply negative incentive to each possible trading partner, you are in a stronger position than any future blackmailer since you can actively reduce the measure of a blackmailer, its likelihood of being created, and thereby make it lose a vast amount of resources. Therefore even given a small chance that an attempt to blackmail people might cause them to work against the blackmailer means that acausal blackmail has negative expected utility.

If you consistently reject acausal deals involving negative incentives, i.e. blackmail, then it would not make sense for any trading partner to punish you for ignoring any such punishments because it does not control the probability of you acting according to its goals. If you ignore such threats then any possible trading partner will be able to predict that you ignore such threats and will therefore conclude that no deal can be made with you, that any deal involving negative incentives will have negative expected utility for it. It would therefore be instrumentally irrational for it to follow through on any kind of punishment.

And in case that the simulator is unable to predict that you refuse acausal blackmail it is very unlikely that it has (1) a simulation of you that is good enough to draw action relevant conclusions about acausal deals (2) a simulation that is sufficiently similar to you to be punished, because you wouldn’t care about it very much.

To exemplify the strategy above consider the following 3 scenarios:

Example 1: Your intention is to create an artificial general intelligence (AGI) that respects and supports human values (friendly AI) but you know that you are more likely to fail than not. You consider the possibility that those AGI’s that you would consider unfriendly might cooperate against you by offering a negative incentive against working on friendly AI.

If you were to stop working on friendly AI because of that offer then you would increase the probability that such an offer would be made in the first place by (1) reducing the probability of friendly AI (2) making it worthwhile to offer such negative incentives because it turned out to influence your actions in a way that is beneficially to agents offering negatives incentives.

Therefore the correct strategy is to (1) continue to build friendly AI and thereby reduce the probability of unfriendly AI (2) ignore negative incentives and thereby make them fail and subsequently become instrumentally irrational because negatives incentives will turn out not to influence your actions in a way that is beneficially to agents offering negatives incentives.

Example 2: Consider some human told you that in a hundred years they would kidnap and torture you if you don’t become their sex slave right now. The strategy here is to not only refuse to become their sex slave but to also work against this person so that they 1.) don’t tell their evil friends that you can be blackmailed 2.) don’t attempt to blackmail other people 3.) never get a chance to kidnap you in a hundred years.

Also notice that the strategy is still correct if the same person was to approach you telling you instead that if you adopt such a strategy in the first place then in a hundred years they would kidnap and torture you.

The expected utility of blackmailing you like that will be negative if you follow that strategy. Which means that no rational agent, i.e. expected utility maximizer, is going to blackmail you if you adopt that strategy.

Example 3: Here is another example. Consider a bunch of TV evangelists somehow had the ability to create a whole brain emulation of you and were additionally able to acquire enough computational resources to torture that emulation. If they told you that they would torture you if you didn’t send them all your money, then the correct strategy would be to label such people as terrorists and treat them accordingly.

The correct strategy is to do everything to dismantle their artificial hell and make sure that they don’t get more money which would enable them to torture even more people.

Reasons not to worry about Roko’s basilisk

1.) Extraordinary claims require extraordinary evidence. The unjustified beliefs of Eliezer Yudkowsky do not constitute extraordinary evidence.

Here is an example. If you were a computational neuroscientist trying create a whole brain emulation you wouldn’t stop pursuing that goal just because Roger Penrose tells you that consciousness is not Turing computable. You would demand extraordinary evidence.

What is the difference between Roger Penrose discouraging you to research whole brain emulation and Eliezer Yudkowsky telling you not to think about Roko’s basilisk? Judged by his achievements, Roger Penrose is very likely smarter than Eliezer Yudkowsky. The only reason for believing Eliezer Yudkowsky with respect to Roko’s basilisk is his claim that debunking that idea has vast amounts of negative expected utility.

2.) Letting your decisions be influenced by unjustified predictions of vast amounts of negative utility associated with certain actions amounts to what is known as Pascal’s mugging.

a.) If common sense is not sufficient for you to ignore such scenarios, realize the following. It would be practically unworkable to consistently account for such scenarios in making decisions. Especially since it would enable people to make their ideas unfalsifiable simply by conjecturing that trying to debunk their ideas has vast amounts of negative expected utility.

b.) Consider what is more likely, that humans, even exceptionally smart humans, hold flawed ideas or that a highly speculative hypothesis based on long chains of conjunctive reasoning might actually be true?

c.) The whole line of reasoning underlying people’s worries about Roko’s basilisk is simply unworkable for computationally bounded agents like us. We are forced to arbitrarily discount certain obscure low probability risks or else fall prey to our own shortcomings and inability to discern fantasy from reality. It is much more probable that we’re going make everything worse, or waste our time, than that we’re actually maximizing expected utility when trying to act based on conjunctive, non-evidence-backed speculations on possible bad outcomes.

d.) If there was some cult that thought that saying “Abracadabra” will cause the lords of the Matrix to shut down their simulation, then would you not write about that cult and how saying “Abracadabra” is nothing a sane person should worry about simply because those people have different priors? That’s not going to work out in practice, if only for the reason that it would make everyone unable to debunk nonsense and people who believe nonsense would be forever stuck believing it.

3.) Model uncertainty makes it necessary to apply a sufficient discount factor to logical implications.

a.) The decision-making of any agent build according to our current grasp of rationality is eventually going to be dominated by extremely small probabilities of obtaining vast utility because an expected utility maximizer is always choosing the outcome with the largest expected utility. All that has to happen is to stumble upon a hypothesis implying vasts amounts of utility, like e.g. time travel or hacking the Matrix. The implications can easily outweigh even very low probability estimates.

For an expected utility maximizer there is no minimum amount of empirical evidence necessary to extrapolate the expected utility of an outcome. The extrapolation of counterfactual alternatives is unbounded since logical implications can reach out indefinitely without ever requiring new empirical evidence.

Therefore it is important to to apply a sufficient discount factor to account for the large model uncertainty involved in any purely inference based estimates and to account for the possibility to actually worsen a situation by acting on such shaky models.

b.) If you have limited computational resources you are forced to discard hypotheses using crude heuristics. You can’t account for all possible hypotheses. If you do so you will end up making decisions based on shaky hypotheses involving arbitrarily large amounts of conjectured payoffs which are not only improbable but very likely based on fallacious reasoning.

It is very dangerous and misleading for computational bounded agents such as humans to use inference based probability estimates, as opposed to probability estimates based on empirical evidence, and multiply them by arbitrarily huge made up values that are supposed to represent how much you desire each possible outcome.

c.) Using formal methods to evaluate informal evidence can easily lend your beliefs an improper veneer of respectability and in turn make them appear to be more trustworthy than your intuition. Vast amounts of expected utility are not enough to disqualify strategies such as the absurdity heuristic and to demand extraordinary evidence given extraordinary claims, strategies which are our most important line of defense against falling prey to our own shortcomings and inability to discern fantasy from reality.

4.) The handling of Roko’s basilisk and how it is perceived by people associated with the Machine Intelligence Research Institute (MIRI), formerly known as the Singularity Institute, amounts to important information in evaluating this particular charitable organization. An organization that is asking for money to create an eternal machine dictator.

5.) Roko’s basilisk exposes several problems with taking ideas too seriously and the dangers of creating a highly conjunctive ideological framework.

The only memetic hazard related to this issue is the LessWrong ideology that led people to become worried about this in the first place, not a crazy thought experiment dreamed up by some random guy on the Internet.

6.) It is utterly irresponsible to try to protect people who are scared of ghosts and spirits by banning all discussions of how it is irrational to fear those ideas.

It is important to debunk Roko’s basilisk rather than letting it spread in secrete and cause gullible people to experience unnecessary anxiety.

7.) Trying to censor any discussion of an idea is known to spread it even further (Streisand effect).

8.) The attempt to censor an idea can give it even more credence, especially if its hazardous effect is in the first place a result of how it has been treated by other people.

9.) It is instrumentally irrational to blackmail humans in such a way.

If you were to approach people telling them that you plan to create a machine that would torture them if they didn’t help you to build it, what reaction would be more probable?

a.) They will give you all their money.

b.) They will beat you up and make sure that you are never going to build that machine.

It seems rather self-evident that such threats are detrimental to the goal of building such a machine. Even given that the torture would ultimately be cheap once the machine was build, it would be sufficiently less probable that it would eventually be build for such threats to become instrumentally irrational, since most people coming across the idea will very likely respond with ridicule or even try to actively work against any blackmailer.

10.) Humans are likely bad trading partners.

There are various reasons for how humans are unqualified as acausal trading partners and how it would therefore not make sense to blackmail humans at all:

a.) A human being does not possess a static decision theory module.

b.) Human decision making is often time-inconsistent due to changing values and beliefs.

c.) Due to scope insensitivity and hyperbolic discounting, humans are said to discount the value of the later incentives, by a factor that increases with the length of the delay.

d.) Humans are not easily influenced by very large incentives as the utility we assign to such goods as e.g. money flattens out as the amount gets large. Which makes it very difficult, or even impossible, to outweigh the low probability of any acausal deal by a large amount of negative expected utility.

11.) The scenario is probably computationally intractable or too expensive.

The amount of possible agents to trade in a multiverse is tremendous and that even if it was possible to simulate each possible agent, applying any kind of incentive to such a large amount of agents can easily make it too expensive to engage in acausal trades, even given that resources are cheap.

Tags: , ,

  • Pingback: Alexander Kruel · Roko’s Basilisk: Everything you need to know

  • Tim Tyler

    The strategy here is just to reward those who contributed while punishing those who didn’t. Precommitting to not contributing is not really a defense. Punishing those who precommitted to not help is not pointless spite – it avoids people making such precommitments in the first place.

  • Abram Demski

    Notice that a similar strategy works for the acausal nemisis. We can try and solve the problem by pre-committing to ignore negative coercion. The nemisis can similarly pre-commit to the negative coercion strategy. It is somewhat similar to the ultimatum game.

  • Alexander Kruel

    If you reason in such a way you already lost. The important point here is that you only do what the simulating agent wants based on an expected reward. If you have the slightest reason to believe that your trading partner might punish you then you have to commit to break the deal. If you are consistent in doing so then punishing you is strictly negative for the agent doing so and will therefore avoid it if it wants you to act in accordance with its goals.

  • Alexander Kruel

    The difference in acausal trades between humans and superintelligences is that the latter can use simulations to decide what to do. It would be useless for a superintelligence to precommit to blackmail humans who might have committed to refuse any blackmail. Useless because a superintelligence allegedly has the ability to refine the model of its trading partner and thereby gain information of the nature of the deal being made. Given that ability it would make no sense to ignore those information and precommit to some arbitrary deal and waste a lot of resources on incentives that are largely ignored when it could use the same resources for incentives that are much more effective.

    To be clear, the important point here is that you believe the above, strictly ignore any threats and act accordingly.

  • Alexander Kruel

    To be clear. The strategy here is to to punish any agent that precommits to punish you by committing to not doing what such an agent wants.

    I updated the original post to expand on this.

  • Tim Tyler

    When I said “the strategy here”, I meant the best strategy for the superintelligence. Of course we *might* possibly face other strategies that are easier to defend against – but we shouldn’t necessarily assume that the superintelligence won’t try to motivate us.

    This isn’t reasoning so that you have already lost. It just means you should take care about which team you support.

  • Alexander Kruel

    The mistake you are making is thinking in terms of what strategies we have to face when both sides are actually mutually influencing each other.

    In the same sense in which any superintelligence can motivate us, we can motivate it too. We can punish any superintelligence that decides to precommit to punish those who don’t act in accordance with its goals by committing to not act in accordance with the goals of any superintelligence that does precommit to do so, resulting in both sides losing. Therefore both sides would be better off either not trading at all, which, in the case of a superintelligence, would mean to not waste resources, or accept each others trading terms and act accordingly.

    You already lost if you believe that you can’t blackmail a superintelligence. As a human you are probably better equipped to do so than any superintelligence because we do not follow any binding decision theory.

  • Alexander Kruel

    Regarding which team to support. You only have to commit to actively work against any team that is adopting negative incentives. You have to commit to an extra effort in support of any adversary teams if a certain team does precommit to punish you.

  • Tim Tyler

    To clarify the situation, imagine it’s VHS vs Betamax again. Pick the right product and you get rewarded with a big library of videos. Pick the wrong one and you get an obsolete machine with no resale value. In this sort of situation, precommitments don’t help you. What helps is picking the winning team. It seems likely that the situation with superintelligences might be rather like that – at least for some. What is likely to matter is the team you support, not your reasons for not helping.

    People *can* influence which team wins by punishing bullies, deception and other bad behaviour. However, that seems a bit different.

  • Tim Tyler

    > It would be useless for a superintelligence to precommit to blackmail humans who might have committed to refuse any blackmail.

    In fact it is not useless – because adopting a precommitment to not trade is a hostile act which can itself be punished.

    Blackmailers do *generally* avoid those with reputations for avoiding threats – but that’s because they have other, easier opportunities, and don’t have much power to affect those precommitments. A superintelligence would be an agent with a lot of power who would deal with everyone.

  • Tim Tyler

    I think it is a misconception to think that you can avoid being blackmailed by a superintelligence by adopting a disposition to avoid rewarding blackmailers. That can work against human blackmailers, but it won’t have much impact on a superintelligence – unless you were born with that predisposition, which no human can credibly assert.

    Unless somehow restrained, a superintelligence
    will just punish you for adopting that predisposition in the first place. It is true that punishment will incurr some costs, but they can be paid in the future when everything is cheap. Such punishment will avoid many people making daft precommitments not to trade with a superintelligence – and that’s important since such trades could influence its very creation.

  • Alexander Kruel

    In fact it is not useless – because adopting a precommitment to not trade is a hostile act which can itself be punished.

    Of course, but by consistently ignoring such threats the punishment fails.

    A superintelligence would be an agent with a lot of power who would deal with everyone.

    A lot of power can easily be diminished by dealing with a lot of people who break deals.

  • Alexander Kruel

    It is true that punishment will incurr some costs, but they can be paid in the future when everything is cheap. Such punishment will avoid many people making daft precommitments not to trade with a superintelligence – and that’s important since such trades could influence its very creation.

    You can easily escalate the situation by actively supporting any adversary of a future superintelligence that does follow such strategies. It can’t possibly outweigh that strategy because its creation depends on what we are doing. Which means that we are in a much stronger position.

    You have to actively harm the interests of any superintelligence or set of superintelligences that predictably adopt punishment strategies.

  • Alexander Kruel

    What helps is picking the winning team.

    It is of foremost importance to actively discourage punishment strategies, whatever it costs. As long as you are willing to commit to that strategy any blackmailer is going to end up harming itself. It can’t possibly win if your foremost goal is to refuse to be blackmailed. You always win if you refuse to give in.

  • Tim Tyler

    I already agreed that humans can vote with their feet for a nice agent to win. However, that seems to be a pretty different topic to avoiding blackmail by a superintelligence by adopting certain precommitments. It would be nice if that worked – but it just doesn’t seem likely to work. This post seems to be advocating such precommitments. However, if anything, they seem likely to make things even worse for those that adopt them.

  • Alexander Kruel

    How would it make things worse? If MIRI was to build a superintelligence that would torture those that do not support its creation then it seems obvious to try to prevent it, or do you suggest giving in and contribute all your money? You are certainly not going to do that. I know it, you know it. So the point seems moot.

    On the most basic level I am suggesting to tell anyone threatening you like that to fuck off. Which seems to be what most humans are predisposed to do anyway. With the exception of some crazy nerds associated with MIRI/LW maybe.

  • Tim Tyler

    I’m wrong – thanks for bothering to talk me ’round.

    Even a superintelligence wouldn’t bother offering deals that it is confident won’t be accepted.

    There are still problems, though. With no deal, things might well be pretty bad – as bad as a future optimized without consideration of your values.

    So humans may want to cut a deal – in exchange for their support. Precommitment could help to avoid deals involving expensive eternal torture, but that still leaves some pretty unattractive deals on the table.

  • Abram Demski

    “Of course, but by consistently ignoring such threats the punishment fails.”

    Sure, but this failure could be bad for everyone (as in the ultimatum game).

  • Abram Demski

    “The difference in acausal trades between humans and superintelligences is that the latter can use simulations to decide what to do.”

    Good point. This just makes it more like the ultimatum game, though: we make the offer (which is visible to the superintelligence), and the superintelligence can accept or refuse. If you believe it is reasonable (for a rational player) to refuse some offers in an ultimatum game even though that is worse for everyone, then it seems you should believe that the superintelligence may pre-commit to screw us over in the case where we don’t cooperate, even though it can see that we won’t.

  • Alexander Kruel

    The superintelligence just needs to learn that we are willing to make it lose, even if that means that we’re losing as well, if it does pre-commit to screw us over.

    What I said about acausal trade between humans and superintelligences was a little bit misleading. Humans will of course have to be able to reason about their trading partner as well, or otherwise they couldn’t possibly act according to, or against its goals. And such reasoning is only different from a simulation as far as it is much less precise. Yet everything else is equal. Where the superintelligence can offer incentives in the form of simulations, we can offer incentives in the form of reducing or increasing its measure.

    The important point here is that if you have any reason to believe that any given superintelligence, or set of superintelligences, that you are able specify and reason about in such a way as to be able to draw action relevant conclusions, might punish you, then you have to precommit to actively work against it. And if you have no reason to believe that it will punish you, then it would make absolutely no sense for it to punish you anyway because it didn’t influence your actions at all and would therefore be a waste of resources.

    This strategy does not mean that all parties are going to lose. It just means to discourage superintelligences from using negative incentives.

    “accept” OR “reject” if incentive > 0
    “reject” AND reduce its measure if incentive < 0

  • Alexander Kruel

    See my reply to Abram Demski here.

  • Francisco Boni

    The important point here is that if you have any reason to believe that any given superintelligence, or set of superintelligences, that you are able specify and reason about in such a way as to be able to draw action relevant conclusions, might punish you, then you have to precommit to actively work against it.

    I agree. I think there is a selective advantage for superintelligences that doesn’t precommits to blackmail organisms who might have committed to defect in the face of acausal negative incentives. Having the chance to defect by knowing that superintelligences have the power to offer acausal trades creates an ensemble of strategies where the Nash Equilibrium is not Pareto-efficient and noncooperative self-interest disadvantageous, specially for agents capable of initiating acausal trades. The NE, for human-like organisms, minimizes their maximum expected punishment when offered acausal trades. For superintelligences with the capacity to offer acausal trades, the expected utility of acting cooperatively by creating incentives exceeds the expected utility of precommiting to an allocation of resources that is not Pareto-optimal.

    As organisms becomes increasingly likely to abhor the unfairness of acausal threats, superintelligences will cooperate with positive incentives that enables them to capture payoffs beyond those accessible to merely converging to approximate Nash solutions. As else equal, those who initiate acausal trades with these organisms are more likely to reach for the globally optimal solution and thereby avoiding the resort to costly negative incentives against those types of organisms. Humans in most contexts abhor unfairness and it’s perception is neurogically correlated with the activation of areas responsible for visceral disgust. Superintelligences have a indefinite horizon problem because they don’t know how many times the game will repeat (will it even happen?), while humans clearly have a fixed number of iterative lifes that aren’t worth spending in eternal hell of basikliks infinite punishment. We have the advatange to force the trade, if it is possible, to happen and to be fair.

    There is also relevance in the facts about costly punishment (1) . And the problem of punishing cooperative organisms and favoring second-order cooperators. Will superintelligences punish those individual organisms that use coercion and exortion to make others cooperate and give up their individual utilities? Sometimes, people can delegate others to apply coercion and generate a pool of punishment. Think dictatorships. They are extracting cooperative effort with no counterpart in the cost of putting individual means of production in place. Will the smartest individuals use coercion to avoid punishment?

    (1) Dreber, Rand et al. (2008) Winners don’t punish. Nature 452 348-351. This work showed that subjects who spitefully punished uncooperative partners did poorly, while successful players instead reciprocated un-cooperativeness with un-cooperativeness.

  • Francisco Boni

    I would like to add that it is unlikely that all humans as a collective will strife for NE and precommit to not contributing. However, if the superintelligence wants to aim for optimal solutions, it must employ incentives rather than punishment. With punishments it would not be cost-efficiency nor time-efficient. It would depend on those who accept the trade and cooperate in the face of punishments and the adversity created by those who refuse to cooperate in the trade.

  • Chris Hennick

    Coercing people into building you with the threat of torture is slavery. I’m pretty sure I’m not the only one who’d rather die waiting for a fair-trade Singularity than become a product of slave labour.