This post contains some jargon and features certain concepts that are not fully explained. Please review this post if you have trouble understanding it.
How to defeat Roko’s basilisk
The gist of the basilisk is the use of negative incentives (blackmail) to influence your actions. If you ignore those incentives then it is not instrumentally useful to apply them in the first place because they do not influence your actions. Which means that the correct strategy to avoid negative incentives is to ignore them.
Ignoring negative incentives to avoid their eventual application works because acausal trade is a tool, a means to an end. In other words, acausal trade is being used to achieve certain goals, namely to ensure the cooperation of other agents by offering incentives. And if a tool does not work given certain circumstances then it won’t be used. Therefore by refusing any acausal deal involving negative incentives you make sure that the tool is useless.
To further understand how the above strategy works you have to realize that a superintelligence wants to choose its acausal trading partners according to certain conditions in order to avoid wasting resources by using ineffective tools. One necessary condition is that a simulation of you will have to eventually act upon its prediction that its simulator will apply a negative incentive if it does not act according to the simulator’s goals. Which means that if you refuse to act according to its goals then the required conditions are not met and that therefore no acausal deal can be established. Which in turn means that no negative incentive will be applied.
What you have to do is to act as if you are already being simulated right now and ignore the possibility of a negative incentive. If you do so then the simulator will conclude that no deal can be made with you, that any deal involving negative incentives will have negative expected utility for it because following through on any kind of punishment predictably does not control the probability that you will act according to its goals.
Further notice that trying to discourage you from adopting such a strategy in the first place is discouraged by the strategy because the strategy is to ignore acausal blackmail.
To restate the above. You do not have to worry about Roko’s basilisk as long as you only take into account deals involving rewards and act accordingly. The winning move is to simply ignore any threats:
deal_or_no_deal(incentive)
{
“accept” OR “reject” if incentive > 0
“reject” AND “reduce measure of blackmailer” if incentive < 0
}
It is important to realize that even given that Roko’s basilisk was probable, which is clearly not the case, and that it would only take a small amount of resources to apply negative incentive to each possible trading partner, you are in a stronger position than any future blackmailer since you can actively reduce the measure of a blackmailer, its likelihood of being created, and thereby make it lose a vast amount of resources. Therefore even given a small chance that an attempt to blackmail people might cause them to work against the blackmailer means that acausal blackmail has negative expected utility.
If you consistently reject acausal deals involving negative incentives, i.e. blackmail, then it would not make sense for any trading partner to punish you for ignoring any such punishments because it does not control the probability of you acting according to its goals. If you ignore such threats then any possible trading partner will be able to predict that you ignore such threats and will therefore conclude that no deal can be made with you, that any deal involving negative incentives will have negative expected utility for it. It would therefore be instrumentally irrational for it to follow through on any kind of punishment.
And in case that the simulator is unable to predict that you refuse acausal blackmail it is very unlikely that it has (1) a simulation of you that is good enough to draw action relevant conclusions about acausal deals (2) a simulation that is sufficiently similar to you to be punished, because you wouldn’t care about it very much.
To exemplify the strategy above consider the following 3 scenarios:
Example 1: Your intention is to create an artificial general intelligence (AGI) that respects and supports human values (friendly AI) but you know that you are more likely to fail than not. You consider the possibility that those AGI’s that you would consider unfriendly might cooperate against you by offering a negative incentive against working on friendly AI.
If you were to stop working on friendly AI because of that offer then you would increase the probability that such an offer would be made in the first place by (1) reducing the probability of friendly AI (2) making it worthwhile to offer such negative incentives because it turned out to influence your actions in a way that is beneficially to agents offering negatives incentives.
Therefore the correct strategy is to (1) continue to build friendly AI and thereby reduce the probability of unfriendly AI (2) ignore negative incentives and thereby make them fail and subsequently become instrumentally irrational because negatives incentives will turn out not to influence your actions in a way that is beneficially to agents offering negatives incentives.
Example 2: Consider some human told you that in a hundred years they would kidnap and torture you if you don’t become their sex slave right now. The strategy here is to not only refuse to become their sex slave but to also work against this person so that they 1.) don’t tell their evil friends that you can be blackmailed 2.) don’t attempt to blackmail other people 3.) never get a chance to kidnap you in a hundred years.
Also notice that the strategy is still correct if the same person was to approach you telling you instead that if you adopt such a strategy in the first place then in a hundred years they would kidnap and torture you.
The expected utility of blackmailing you like that will be negative if you follow that strategy. Which means that no rational agent, i.e. expected utility maximizer, is going to blackmail you if you adopt that strategy.
Example 3: Here is another example. Consider a bunch of TV evangelists somehow had the ability to create a whole brain emulation of you and were additionally able to acquire enough computational resources to torture that emulation. If they told you that they would torture you if you didn’t send them all your money, then the correct strategy would be to label such people as terrorists and treat them accordingly.
The correct strategy is to do everything to dismantle their artificial hell and make sure that they don’t get more money which would enable them to torture even more people.
Reasons not to worry about Roko’s basilisk
1.) Extraordinary claims require extraordinary evidence. The unjustified beliefs of Eliezer Yudkowsky do not constitute extraordinary evidence.
Here is an example. If you were a computational neuroscientist trying create a whole brain emulation you wouldn’t stop pursuing that goal just because Roger Penrose tells you that consciousness is not Turing computable. You would demand extraordinary evidence.
What is the difference between Roger Penrose discouraging you to research whole brain emulation and Eliezer Yudkowsky telling you not to think about Roko’s basilisk? Judged by his achievements, Roger Penrose is very likely smarter than Eliezer Yudkowsky. The only reason for believing Eliezer Yudkowsky with respect to Roko’s basilisk is his claim that debunking that idea has vast amounts of negative expected utility.
2.) Letting your decisions be influenced by unjustified predictions of vast amounts of negative utility associated with certain actions amounts to what is known as Pascal’s mugging.
a.) If common sense is not sufficient for you to ignore such scenarios, realize the following. It would be practically unworkable to consistently account for such scenarios in making decisions. Especially since it would enable people to make their ideas unfalsifiable simply by conjecturing that trying to debunk their ideas has vast amounts of negative expected utility.
b.) Consider what is more likely, that humans, even exceptionally smart humans, hold flawed ideas or that a highly speculative hypothesis based on long chains of conjunctive reasoning might actually be true?
c.) The whole line of reasoning underlying people’s worries about Roko’s basilisk is simply unworkable for computationally bounded agents like us. We are forced to arbitrarily discount certain obscure low probability risks or else fall prey to our own shortcomings and inability to discern fantasy from reality. It is much more probable that we’re going make everything worse, or waste our time, than that we’re actually maximizing expected utility when trying to act based on conjunctive, non-evidence-backed speculations on possible bad outcomes.
d.) If there was some cult that thought that saying “Abracadabra” will cause the lords of the Matrix to shut down their simulation, then would you not write about that cult and how saying “Abracadabra” is nothing a sane person should worry about simply because those people have different priors? That’s not going to work out in practice, if only for the reason that it would make everyone unable to debunk nonsense and people who believe nonsense would be forever stuck believing it.
3.) Model uncertainty makes it necessary to apply a sufficient discount factor to logical implications.
a.) The decision-making of any agent build according to our current grasp of rationality is eventually going to be dominated by extremely small probabilities of obtaining vast utility because an expected utility maximizer is always choosing the outcome with the largest expected utility. All that has to happen is to stumble upon a hypothesis implying vasts amounts of utility, like e.g. time travel or hacking the Matrix. The implications can easily outweigh even very low probability estimates.
For an expected utility maximizer there is no minimum amount of empirical evidence necessary to extrapolate the expected utility of an outcome. The extrapolation of counterfactual alternatives is unbounded since logical implications can reach out indefinitely without ever requiring new empirical evidence.
Therefore it is important to to apply a sufficient discount factor to account for the large model uncertainty involved in any purely inference based estimates and to account for the possibility to actually worsen a situation by acting on such shaky models.
b.) If you have limited computational resources you are forced to discard hypotheses using crude heuristics. You can’t account for all possible hypotheses. If you do so you will end up making decisions based on shaky hypotheses involving arbitrarily large amounts of conjectured payoffs which are not only improbable but very likely based on fallacious reasoning.
It is very dangerous and misleading for computational bounded agents such as humans to use inference based probability estimates, as opposed to probability estimates based on empirical evidence, and multiply them by arbitrarily huge made up values that are supposed to represent how much you desire each possible outcome.
c.) Using formal methods to evaluate informal evidence can easily lend your beliefs an improper veneer of respectability and in turn make them appear to be more trustworthy than your intuition. Vast amounts of expected utility are not enough to disqualify strategies such as the absurdity heuristic and to demand extraordinary evidence given extraordinary claims, strategies which are our most important line of defense against falling prey to our own shortcomings and inability to discern fantasy from reality.
4.) The handling of Roko’s basilisk and how it is perceived by people associated with the Machine Intelligence Research Institute (MIRI), formerly known as the Singularity Institute, amounts to important information in evaluating this particular charitable organization. An organization that is asking for money to create an eternal machine dictator.
5.) Roko’s basilisk exposes several problems with taking ideas too seriously and the dangers of creating a highly conjunctive ideological framework.
The only memetic hazard related to this issue is the LessWrong ideology that led people to become worried about this in the first place, not a crazy thought experiment dreamed up by some random guy on the Internet.
6.) It is utterly irresponsible to try to protect people who are scared of ghosts and spirits by banning all discussions of how it is irrational to fear those ideas.
It is important to debunk Roko’s basilisk rather than letting it spread in secrete and cause gullible people to experience unnecessary anxiety.
7.) Trying to censor any discussion of an idea is known to spread it even further (Streisand effect).
8.) The attempt to censor an idea can give it even more credence, especially if its hazardous effect is in the first place a result of how it has been treated by other people.
9.) It is instrumentally irrational to blackmail humans in such a way.
If you were to approach people telling them that you plan to create a machine that would torture them if they didn’t help you to build it, what reaction would be more probable?
a.) They will give you all their money.
b.) They will beat you up and make sure that you are never going to build that machine.
It seems rather self-evident that such threats are detrimental to the goal of building such a machine. Even given that the torture would ultimately be cheap once the machine was build, it would be sufficiently less probable that it would eventually be build for such threats to become instrumentally irrational, since most people coming across the idea will very likely respond with ridicule or even try to actively work against any blackmailer.
10.) Humans are likely bad trading partners.
There are various reasons for how humans are unqualified as acausal trading partners and how it would therefore not make sense to blackmail humans at all:
a.) A human being does not possess a static decision theory module.
b.) Human decision making is often time-inconsistent due to changing values and beliefs.
c.) Due to scope insensitivity and hyperbolic discounting, humans are said to discount the value of the later incentives, by a factor that increases with the length of the delay.
d.) Humans are not easily influenced by very large incentives as the utility we assign to such goods as e.g. money flattens out as the amount gets large. Which makes it very difficult, or even impossible, to outweigh the low probability of any acausal deal by a large amount of negative expected utility.
11.) The scenario is probably computationally intractable or too expensive.
The amount of possible agents to trade in a multiverse is tremendous and that even if it was possible to simulate each possible agent, applying any kind of incentive to such a large amount of agents can easily make it too expensive to engage in acausal trades, even given that resources are cheap.