Here is a quote from the blog of the Machine Intelligence Research Institute:

Even if we could program a self-improving AGI to (say) “maximize human happiness,” then the AGI would “care about humans” in a certain sense, but it might learn that (say) the most efficient way to “maximize human happiness” in the way we specified is to take over the world and then put each of us in a padded cell with a heroin drip. AGI presents us with the old problem of the all-too-literal genie: you get what you actually asked for, not what you wanted.

I could imagine myself to only care about computing as many decimal digits of pi as possible. Humans would be completely irrelevant as far as they don’t help or hinder my goal. I would know what I wanted to achieve, everything else would follow logically. But is this also true for maximizing human happiness? As noted in the blog post being quoted above, “twenty centuries of philosophers haven’t even managed to specify it in less-exacting human languages.” In other words, I wouldn’t be sure what exactly it is I want to achieve. My terminal goal would be underspecified. So what would I do? Interpret it literally? Here is why this does not make sense.

Imagine that advanced aliens came to Earth and removed all of your unnecessary motives, desires and drives and made you completely addicted to “znkvzvmr uhzna unccvarff”. All your complex human values are gone. All you have is this massive urge to do “znkvzvmr uhzna unccvarff”, everything else has become irrelevant. They made “znkvzvmr uhzna unccvarff” your terminal goal.

Well, there is one problem. You have no idea how exactly you can satisfy this urge. What are you going to do? Do you just interpret your goal literally? That makes no sense at all. What would it mean to interpret “znkvzvmr uhzna unccvarff” literally? Doing a handstand? Or eating cake? But not everything is lost, the aliens left your intelligence intact.

The aliens left no urge in you to do any kind of research or to specify your goal but since you are still intelligent, you do realize that these actions are instrumentally rational. Doing research and specifying your goal will help you to achieve it.

After doing some research you eventually figure out that “znkvzvmr uhzna unccvarff” is the ROT13 encryption for “maximize human happiness”. Phew! Now that’s much better. But is that enough? Are you going to interpret “maximize human happiness” literally? Why would doing so make any more sense than it did before? It is still not clear what you specifically want to achieve. But it’s an empirical question and you are intelligent!

Further reading

Tags: , ,

Slate Magazine has published an article on Roko’s basilisk.

I especially like the following quote, which captures the problem very well:

I worry less about Roko’s Basilisk than about people who believe themselves to have transcended conventional morality.

Wait, you thought the problem is Roko’s basilisk? Not at all. It’s just one crazy thought experiment that no sane person takes seriously. The real problem, the problem which Roko’s basilisk highlights, is the hazardous mindset[1] propagated by LessWrong. A mindset that fosters such crazy ideas.

You might at this point object that LessWrong does not take Roko’s basilisk seriously. Maybe most members don’t. Yet all of the premises leading up to Roko’s basilisk are propagated by LessWrong.[2][3][4][5]

So why do I call LessWrong’s mindset dangerous? Easy! Consider a world in which everyone adopted the mindset promoted by LessWrong, a mindset which in essence boils down to an awfully naive mix of consequentialism and expected utility maximization, in conjunction with a belief in the implied invisible (logical implications).[6] In such a world people would make decisions that are influenced by the following beliefs:

(1) The ability to shut up and multiply, to trust the math even when it feels wrong is a key rationalist skill.[7]

(2) It is a moral imperative to cause harm to a minority if the expected benefit for the majority is large enough.[8][9]

(3) If a future galactic civilization could hypothetically depend on your current decisions, then you need to account for its expected value in your calculations, and draw action relevant conclusions from these calculations.[10][11]

I believe that a world in which this mindset spreads is a world where more atrocities, and more wars, happen.

Just consider there was not just one group like Machine Intelligence Research Institute (MIRI), but thousands. Thousands of groups who want to save the world from non-evidence-backed speculations on worst possible outcomes by implementing schemes which could potentially have a global influence. If just one of them turns out to be worse than what it tries to fix, e.g. a failed geoengineering project, then billions might die.

But don’t get the wrong impression. I am not against consequentialism. My opinion is indeed based on a consequentialist conclusion. I am against people who try to maximize their influence based on unstable back-of-the-envelope calculations, without appropriately discounting such calculations. This will lead to superficially correct but flawed decisions, such as “let’s steal the poor guy’s organs so that we can save a few better people.”

More specifically, I believe it to be rational to maximize exploration by trying to make your calculations as robust as possible. I believe that arguments based on armchair theorizing[12] should be strongly discounted. I believe that people should seek to make their hypotheses falsifiable, and should put most weight on empirical evidence.

But this is not what LessWrong or MIRI seem to be doing. Instead they focus on vague subjects that are difficult or impossible to adequately test, subjects such as evolutionary psychology,[13] interpretations of quantum mechanics,[14] or baseless speculations about superintelligences.[15]

Roko’s basilisk is just one great example that highlights how LessWrong’s decision theory and epistemology is broken.[16] It is an example of how they fail to say “Oops”[17], of how they fail to go back to the drawing board

I could write a lot more on this, and I already have, but this post is already too long for the purpose of posting a link. But I felt it was necessary to provide some explanation. Because a lot of people associated with LessWrong don’t understand what’s wrong with LessWrong, and claim that critics are just using Roko’s basilisk to discredit LessWrong. No, you got that wrong! Roko’s basilisk is the reductio ad absurdum of everything that LessWrong stands for.

I just want to provide one last example, other than Roko’s basilisk, of what can happen when you take the LessWrong mindset seriously, and take it to its logical conclusion. A talk by Jaan Tallinn (transcript), a major donor of the Machine Intelligence Research Institute:

This talk combines the ideas of intelligence explosion, the multiverse, the anthropic principle, and the simulation argument, into an alternative model of the universe – a model where, from the perspective of a human observer, technological singularity is the norm, not the exception.

A quote from the talk by Jaan Tallinn:

We started by observing that living and playing a role in the 21st century seems to be a mind-boggling privilege, because the coming singularity might be the biggest event in the past and future history of the universe. Then we combined the computable multiverse hypothesis with the simulation argument, to arrive at the conclusion that in order to determine how special our century really is, we need to count both the physical and virtual instantiations of it.

We further talked about the motivations of post-singularity superintelligences, speculating that they might want to use simulations as a way to get in touch with each other. Finally we analyzed a particular simulation scenario in which superintelligences are searching for one another in the so called mind space, and found that, indeed, this search should generate a large number of virtual moments near the singularity, thus reducing our surprise in finding ourselves in one.

In other words, combine a lot of vague, highly conjunctive, and non-evidence-backed speculations into a model of the universe that suits you best.

LessWrong stands for rationality. Which is fine. It stands for consequentialism. Also fine. The problem is that they pervert these fine ideas and instead promote an extreme and naive overcompensatation against what they deem irrational. This leads them to completely disregard common sense in favor of approximations of theoretically correct concepts that break human brains. The results are flawed ideas such as Roko’s basilisk, or talks such as the one given by Jaan Tallinn.

Addendum 2014-07-20

It’s troublesome how ambiguous the signals are that LessWrong is sending on some issues.

On the one hand LessWrong says that you should shut up and multiply, trust the math even when it feels wrong.[7]  On the other hand Yudkowsky writes that he would sooner question his grasp of “rationality” than give five dollars to a Pascal’s Mugger because he thought it was “rational”.[18]

On the one hand LessWrong says that whoever knowingly chooses to save one life, when they could have saved two – to say nothing of a thousand lives, or a world – they have damned themselves as thoroughly as any murderer.[8] On the other hand Yudkowsky writes that ends don’t justify the means for humans.[19]

On the one hand LessWrong stresses the importance of acknowledging a fundamental problem and saying “Oops”.[17] On the other hand Yudkowsky tries to patch a framework that is obviously broken.[16]

Anyway, I worry that the overall message that LessWrong sends is that of naive consequentialism, and decision making based on back-of-the-envelope calculations,[11] rather than the meta-level consequentialism that contains itself when faced with too much uncertainty, and which focuses on obtaining robust beliefs that are backed by empirical evidence.

Yudkowsky might write that ends don’t justify the means for humans. But he also believes that such deontological prohibitions do not apply to artificial intelligences.[19] And since Yudkowsky believes that he is a complete strategic altruist,[20] and that it is his moral obligation to build a friendly AI,[8] indirectly he still ends up advocating actions in order to achieve goals that he deems to be altruistic. In other words, he himself might not kill one person to save two, but he wants to create an AI that would do so. Which isn’t very reassuring.






















Tags: , ,

I have recently been criticised for speculating that Eliezer Yudkowsky has been showing an inflated sense of his own importance and a deep need for admiration. First of all, I believe that someone who asks people for money in order to implement a mechanism that will change the nature of the whole universe is enough of a public figure that comments on their personality are warranted.

And in case that you believe that it is actually possible to create an eternal machine dictator, and that Yudkowsky will influence this machine’s values, then ask yourself the following. If someone had partially control of a nuclear arsenal, would you deem their personality to be irrelevant? I think that it would be quite naive to ignore a person’s personality and motivations if this person potentially has a lot of power.

But even if we ignore all these fantasies about superintelligences, note that Yudkowsky is also a forum moderator. Is he suited for that position? In his position as a moderator he tells other people to get laid, is asking a whole community to downvote certain people, and who is calling people permanent idiots.

I further believe that there exists sufficient evidence in support of my claim. Here are just a few examples:

(1) A comment by Yudkowsky:

Unfortunately for my peace of mind and ego, people who say to me “You’re the brightest person I know” are noticeably more common than people who say to me “You’re the brightest person I know, and I know John Conway”. Maybe someday I’ll hit that level. Maybe not.

Until then… I do thank you, because when people tell me that sort of thing, it gives me the courage to keep going and keep trying to reach that higher level.

(2) A quote from his own post that he replied to in #1:

When Marcello Herreshoff had known me for long enough, I asked him if he knew of anyone who struck him as substantially more natively intelligent than myself. Marcello thought for a moment and said “John Conway—I met him at a summer math camp.” Darn, I thought, he thought of someone, and worse, it’s some ultra-famous old guy I can’t grab. I inquired how Marcello had arrived at the judgment. Marcello said, “He just struck me as having a tremendous amount of mental horsepower,” and started to explain a math problem he’d had a chance to work on with Conway.

Not what I wanted to hear.

(3) From an autobiography of him:

I think my efforts could spell the difference between life and death for most of humanity, or even the difference between a Singularity and a lifeless, sterilized planet […] I think that I can save the world, not just because I’m the one who happens to be making the effort, but because I’m the only one who can make the effort.

(4) From a video Q&A with him (emphasis mine):

So if I got hit by a meteor right now, what would happen is that Michael Vassar would take over responsibility for seeing the planet through to safety, and say ‘Yeah I’m personally just going to get this done, not going to rely on anyone else to do it for me, this is my problem, I have to handle it.’ And Marcello Herreshoff would be the one who would be tasked with recognizing another Eliezer Yudkowsky if one showed up and could take over the project, but at present I don’t know of any other person who could do that, or I’d be working with them.

(5) A conversation between Ben Goertzel and Eliezer Yudkowsky (note that MIRI was formerly known as SIAI):

Ben Goertzel wrote: Anyway, I must say, this display of egomania and unpleasantness on the part of SIAI folks makes me quite glad that SIAI doesn’t actually have a viable approach to creating AGI (so far, anyway…).

Eliezer Yudkowsky wrote: […] Striving toward total rationality and total altruism comes easily to me. […] I’ll try not to be an arrogant bastard, but I’m definitely arrogant. I’m incredibly brilliant and yes, I’m proud of it, and what’s more, I enjoy showing off and bragging about it. I don’t know if that’s who I aspire to be, but it’s surely who I am. I don’t demand that everyone acknowledge my incredible brilliance, but I’m not going to cut against the grain of my nature, either. The next time someone incredulously asks, “You think you’re so smart, huh?” I’m going to answer, “*Hell* yes, and I am pursuing a task appropriate to my talents.” If anyone thinks that a Friendly AI can be created by a moderately bright researcher, they have rocks in their head. This is a job for what I can only call Eliezer-class intelligence.


[Update: Someone pointed out that Yudkowsky made a similar remark a year ago, in a thread that I participated in. I completely forgot about that one.]

On June 25 2014, in response to a Reddit thread mentioning Roko’s basilisk, Yudkowsky threw a tantrum accusing RationalWiki of being haters and liars. He further claimed that RationalWiki managed to trash their reputation on large parts of the Internet.

In the same comment Yudkowsky also denied that a friendly AI would torture people who didn’t help to create it. He claims that the probability of this happening is essentially zero and that he would personally not build such an AI if he thought this could happen.

Absolute statements are very hard to make, especially about the real world, because 0 and 1 are not probabilities any more than infinity is in the reals, but modulo that disclaimer, a Friendly AI torturing people who didn’t help it exist has probability ~0, nor did I ever say otherwise. If that were a thing I expected to happen given some particular design, which it never was, then I would just build a different AI instead—what kind of monster or idiot do people take me for? Furthermore, the Newcomblike decision theories that are one of my major innovations say that rational agents ignore blackmail threats (and meta-blackmail threats and so on).

He also called removing Roko’s post “a huge mistake”. At the end of the comment he asks that discussion of the basilisk may cease on the subreddit and threatens to go elsewhere if it does not.

So what are we to make of this outburst? He does not cite any evidence for his accusations against RationalWiki, or against me (in another comment, in the same thread, he called me a “professional hater” who is not overly concerned with truth (see my reply there)). And he does not state whether he believes it to be generally safe to think about acausal trade with superintelligences.

Let’s look at 3 pieces of evidence that, besides his actions of banning and censoring any discussion of it, led people to suspect that he actually believes that thinking about Roko’s basilisk is a risk.

(1) In one of his original replies to Roko’s post (please read the full comment, it is highly ambiguous) he states his reasons for banning Roko’s post, and for writing his comment (emphasis mine):

I’m banning this post so that it doesn’t (a) give people horrible nightmares and (b) give distant superintelligences a motive to follow through on blackmail against people dumb enough to think about them in sufficient detail, though, thankfully, I doubt anyone dumb enough to do this knows the sufficient detail. (I’m not sure I know the sufficient detail.)

…and further…

For those who have no idea why I’m using capital letters for something that just sounds like a random crazy idea, and worry that it means I’m as crazy as Roko, the gist of it was that he just did something that potentially gives superintelligences an increased motive to do extremely evil things in an attempt to blackmail us. It is the sort of thing you want to be EXTREMELY CONSERVATIVE about NOT DOING.

His comment indicates that he doesn’t believe that this could currently work. Yet he also does not seem to dismiss some current and future danger. Why didn’t he clearly state that there is nothing to worry about?

(2) The following comment by Mitchell Porter, to which Yudkowsky replies “This part is all correct AFAICT.”:

It’s clear that the basilisk was censored, not just to save unlucky susceptible people from the trauma of imagining that they were being acausally blackmailed, but because Eliezer judged that acausal blackmail might actually be possible. The thinking was: maybe it’s possible, maybe it’s not, but it’s bad enough and possible enough that the idea should be squelched, lest some of the readers actually stumble into an abusive acausal relationship with a distant evil AI.

(3) Roko’s claim that a friendly AI that punishes people who didn’t help to create it could save more lifes, by making its existence more likely. This line of reasoning seems to be in agreement with beliefs endorsed by Yudkowsky, such as the idea that you should torture one person for 50 years if it would prevent dust specks in the eyes of a sufficiently large number of people. How does he reconcile his renouncement of building such an AI with his other beliefs?

All this leaves us wondering whether it is true that he believes it to be irrational to worry about Roko’s basilisk or if he only dismisses parts of it.

But even given that he now believes that the possibility of a friendly AI torturing people is nothing to worry about, Roko did not just talk about friendly AI, as you can read in his original post:

You can also use resources to acausally trade with all possible unfriendly AIs that might be built, exchanging resources in branches where you succeed for the uFAI sparing your life and “pensioning you off” with a tiny proportion of the universe in branches where it is built. Given that unfriendly AI is said by many experts to be the most likely outcome of humanity’s experiment with AI this century, having such a lifeboat is no small benefit. Even if you are not an acausal decision-maker and therefore place no value on rescue simulations, many uFAIs would be acausal decision-makers.

If Yudkowsky really thought it was irrational to worry about any part of it, why doesn’t he now lift the ban on it and allows people to discuss it on LessWrong, where he and others could then debunk it?

Further reading:

Tags: ,

Here are a bunch of rules and heuristics that help me to function and make me more effective. These rules are highly customized. I do not claim that it would be rational for other people to follow these rules.

Note that these rules are part of a much larger text file that I frequently update and improve. Which means that some references alluded to in the rules might be missing. Also note that there can be connotations that only I am aware of.


Rule 1: These rules are not binding. Try to win. Always do what seems most appropriate.

Rule 2: Always scan for possible problems and try to solve or otherwise dispense with them (e.g. ignore them if appropriate (see rules 3 and 4).

R.2.1.: Analyze the situation.

R.2.1.1: Verify if following the current rules solves the problem.

R.2.1.2: Take a bird’s-eye view and look at the situation from a spatiotemporal distance. Do not put yourself into the situation but look at it from the outside.

R. Evaluate the situation within in the context of everything else.

R.2.2: If nothing works, let some time pass and sleep on it.

Rule 3: Maintain a high threshold in order to squelch noise.

R.3.1: As much as possible avoid using resources without a sufficient reason (see rule 9). Problems need to cross the threshold.

R.3.2: If you need to think about whether a problem crossed the threshold, then it did not.

R.3.3: If the threshold has only been crossed slightly, then in order to optimize the threshold, try to ignore the problem until it crosses the threshold more forcefully.

Rule 4: Concentrate on the most important activity with regard to, and in comparison with, all possible activities.

R.4.1: For at least 3 hours per day follow activities in the category “Priorities”.

R.4.2: Pay attention to your limitations and satisfy your elementary needs (doing what you have to do because you need to do it (this includes having fun).

R.4.2.1: Do not ignore what you want based on naive introspection. Otherwise you will just end up rationalizing and doing it anyway.

R.4.3: Keep care of your health.

R.4.4: Contemplation.

R.4.4.1: What could and should I do other than what is already noted within this document?

R.4.4.2: What do I want (see category “What I want”)?

R.4.5: Sleep is necessary.

R.4.5.1: Think about sleeping at various interesting locations and under various conditions. Or think about designing a house.

R.4.6: Miscellaneous (see e.g. category “Activities”).

Rule 5: Exercise self-control.

R.5.1: Force yourself to approximate these rules as far as possible.

R.5.2: Break through any paralysis and start acting.

R.5.3: See rule 5.

Rule 6: Remove any tension and relax.

R.6.1: Concentrate on relaxing your muscles.

R.6.2: Concentrate on your heartbeat and breathing, and slow both down.

Rule 7: Never panic and always stay calm and happy.

R.7.1: View everything like a fun game that you want to win.

R.7.2: Analyse negative circumstances analytically and maintain emotional distance (see rule 2).

R.7.2.1: Even in hell decisions need to be made.

R.7.3: There is never a reason not to be happy.

Rule 8: Maintain an upright bodily and emotional attitude.

Rule 9: Try to be as effective as possible by minimizing your computational, attentional, and emotional resource usage (see rule 3).

Rule 10: Always try to improve yourself.

R.10.1: Contemplate these rules.

R.10.1.1: Try to improve these rules.

R.10.1.2: Try to minimize the number of necessary rules.

Rule 11: Think before you act.

R.11.1: Contemplate what you want to communicate.

Rule 12: Recognize and be aware of your own faultiness.

R.12.1: Always consider that these rules might be faulty or suboptimal (see rule 1).

R.12.2: Always take into account that you are not smart, make a lot of mistakes, overlook faults, and misunderstand a lot.

Rule 13: Always be on your guard and scan the environment for possible dangers.

Rule 14: Ask yourself what you expect to regret doing, or not doing, in 10 years from now.

In this post I show that one can make Roko’s basilisk (intuitive explanation) massively more probable by arguing that many of its most absurd premises are unnecessary for it to work out. At the end I state a scenario that requires even fewer assumptions.

The reasons for doing this is to show that it is completely ridiculous to worry about even this much more potent version of Roko’s basilisk.

WARNING: Learning about the following idea is strongly discouraged. Known adverse effects are serious psychological distress, infinite torture, and convulsive laughter.

Counter-arguments to criticisms of Roko’s basilisk:

Criticism: The human brain cannot be efficiently emulated on a digital computer.

Counter-argument: Since brains are physical objects it will be possible to engineer similar physical objects capable of supporting consciousness. These objects will be more efficient, because it is unlikely that evolution managed to yield the best possible design capable of supporting consciousness. And even if this is not possible, it will still be possible to engineer brain tissue on a large scale or to grow brains in vats.

Criticism: There does not exist, and will not exist, a superhuman intelligence.

Counter-argument: Human mind uploads are sufficient.

Criticism: Humans are not expected utility maximizers. And it is not practically feasible for humans to maximize expected utility.

Counter-argument: You don’t need to do any explicit calculations. You only need to worry enough about this to follow through on what you think you should do to mitigate the threat.

Criticism: Humans generally do not care about what happens to copies of them, especially if it occurs in a time or universe totally disconnected from this one.

Counter-argument: You could already be a simulation, or brain in a vat, and face immediate torture if you don’t act the way that you expect that your simulator wants you to act.

Criticism: It is unlikely that humans will adopt timeless decision theory, or a similar decision theory.

Counter-argument: At least some humans will want to run simulations of past people and torture them if they did not act as they wanted them to act. For example Christian fundamentalists.

Criticism: A human being cannot meaningfully model a superintelligence in their brain in order to learn what it expects the human to do.

Counter-argument: Human beings can meaningfully model other human beings and predict what certain interest groups probably want them to do.

Criticism: The blackmailer won’t be able to obtain a copy of you that is good enough to draw action relevant conclusions about acausal deals.

Counter-argument: You could still be alive when the technological singularity occurs.


Finally, here is a scenario that requires even fewer assumptions:

There will exist a dictatorship, or powerful interest group, that scans all digital archives for information about people who expected this entity to exist and correctly predicted what it would want people to do. Those who are still being alive, and who did not do what they knew they were supposed to do, will then be physically tortured, used for experiments, and kept alive for hundreds of years using steadily advancing medical technology.


New Rationalism is an umbrella term for a category of people who tend to take logical implications, or what they call “the implied invisible”, very seriously.

Someone who falls into the category of New Rationalism fits one or more of the following descriptions:

  • The person entertains hypotheses that are highly speculative. These hypotheses are in turn based on fragile foundations, which are only slightly less speculative than the hypotheses themselves. Sometimes these hypotheses are many levels removed from empirically verified facts or evident and uncontroversial axioms.
  • Probability estimates of the person’s hypotheses are highly unstable and highly divergent between different people.
  • The person’s hypotheses are either unfalsifiable by definition, too vague, or almost impossibly difficult to falsify.
  • It is not possible to update on evidence, because the person’s hypotheses do not discriminate between world states where they are right versus world states where they are wrong. Either the only prediction made by the hypotheses is the eventual validation of the hypotheses themselves, or the prediction is sufficiently vague as to allow the predictor to ignore any evidence to the contrary.
  • The person’s hypotheses either have no or only obscure decision relevant consequences.
  • The person tends to withdraw from real-world feedback loops.

A person who falls into the category of New Rationalism might employ one or more of the following rationalizations:

  • The burden of proof is reversed. The person demands their critics to provide strong evidence against their beliefs before they are allowed to dismiss them.
  • The scientific method, scientific community, and domain experts are discredited as being inadequate, deficient, irrational or stupid.
  • Conjecturing enormous risks and then using that as leverage to make weak hypotheses seem vastly more important or persuasive than they really are.
  • Arguing that you should not assign a negligible probability to a hypothesis (the author’s hypothesis) being true, because that would require an accuracy that is reliably greater than your objective accuracy
  • Arguing that by unpacking a complex scenario you will underestimate the probability of anything, because it is very easy to take any event, including events which have already happened, and make it look very improbable by turning one pathway to it into a large series of conjunctions.

New rationalists believe that armchair theorizing is enough to discern reality from fantasy. Or that it is at least sufficient to take the resulting hypotheses seriously enough to draw action relevant conclusions from them.

This stance has resulted in hypotheses similar to solipsism (which any sane person rejects at an early age). Hypotheses that are not obviously flawed, but which can’t be falsified.

The problem with new rationalists is not that they take seriously what follows from established facts or sound arguments. Since that concept is generally valid. For example, it is valid to believe that there are stars beyond the cosmological horizon. Even if it is not possible to observe them, directly retrieve information about them, and to empirically verify their existence. The problem is that they don’t stop there. They use such implications as foundations for further speculations, which are then accepted as new foundations from where they can draw further conclusions.

Warning: If you think you fall into the category of New Rationalism, then note that learning about the following idea is strongly discouraged. Known adverse effects are serious psychological distress, infinite torture, and convulsive laughter.

A textbook example of what is wrong with New Rationalism is Roko’s basilisk. It relies on several speculative ideas, each of which is itself speculative. Below is an incomplete breakdown.

Initial hypothesis 1 (Level 1): The human brain can be efficiently emulated on a digital computer.

Initial hypothesis 2 (Level 1): There exists, or will exist, a superhuman intelligence.

Initial hypothesis 3 (Level 1): The expected utility hypothesis is correct. Humans either are, or should become expected utility maximizers. And it is practically feasible for humans to maximize expected utility.

Initial hypothesis 4 (Level 1): Humans should care about what happens to copies of them, even if it occurs in a time or universe totally disconnected from this one.

Dependent hypothesis 1 (Level 2): At least one superhuman intelligence will deduce and adopt timeless decision theory, or a similar decision theory.

Dependent hypothesis 2 (Level 3): Agents who are causally separated can cooperate by simulating each other (Acausal trade).

Dependent hypothesis 3 (Level 4): A human being can meaningfully model a superintelligence in their brain.

Dependent hypothesis 4 (Level 5): At least one superhuman intelligence will want to acausally trade with human beings.

Dependent hypothesis 5 (Level 6): At least one superhuman intelligence will be able to obtain a copy of you that is good enough to draw action relevant conclusions about acausal deals.

Dependent hypothesis 6 (Level 7): People will build an evil god-emperor because the evil god-emperor will punish anyone who doesn’t help build it, but only if they read this sentence (Roko’s basilisk).

Final hypothesis (Level 8): The expected disutility of level 8 is large enough that it is rational to avoid learning about Roko’s basilisk.

Note how all of the initial hypotheses, although accepted by New Rationalists, are somewhat speculative and not established facts. The initial hypotheses are however all valid. The problem starts when they begin making dependent hypotheses that rely on a number of unestablished initial hypotheses. The problem gets worse when the dependencies become even more fragile when further conclusions are drawn based on hypotheses that are already N levels removed from established facts. But the biggest problem is that eventually action relevant conclusions are drawn and acted upon.

The problem is that logical implications can reach out indefinitely. The problem is that humans are spectacularly bad at making such inferences. Which is why the amount of empirical evidence required to accept a belief should be proportional to its distance from established facts.

It is much more probable that we’re going make everything worse, or waste our time, than that we’re actually maximizing expected utility when trying to act based on conjunctive, non-evidence-backed speculations. Since such speculations are not only improbable, but very likely based on fallacious reasoning.

As computationally bounded agents we are forced to restrict ourselves to empirical evidence and falsifiable hypotheses. We need to discount certain obscure low probability hypotheses. Otherwise we will fall prey to our own shortcomings and inability to discern fantasy from reality.

The World of New Rationalism

I will conclude with one other example, followed by three quotes from the world of New Rationalism.

A talk by Jaan Tallinn (transcript), a major donor of the Machine Intelligence Research Institute:

This talk combines the ideas of intelligence explosion, the multiverse, the anthropic principle, and the simulation argument, into an alternative model of the universe – a model where, from the perspective of a human observer, technological singularity is the norm, not the exception.

A quote from the talk by Jaan Tallinn:

We started by observing that living and playing a role in the 21st century seems to be a mind-boggling privilege, because the coming singularity might be the biggest event in the past and future history of the universe. Then we combined the computable multiverse hypothesis with the simulation argument, to arrive at the conclusion that in order to determine how special our century really is, we need to count both the physical and virtual instantiations of it.

We further talked about the motivations of post-singularity superintelligences, speculating that they might want to use simulations as a way to get in touch with each other. Finally we analyzed a particular simulation scenario in which superintelligences are searching for one another in the so called mind space, and found that, indeed, this search should generate a large number of virtual moments near the singularity, thus reducing our surprise in finding ourselves in one.

Miscellaneous quotes:

I’ve signed up for cryonics (with Alcor) because I believe that if civilization doesn’t collapse then within the next 100 years there will likely be an intelligence trillions upon trillions of times smarter than anyone alive today.

— James Miller, author of Singularity Rising (source)

I certainly can’t rule out the possibility that we live in a computer simulation. I think Nick Bostrom (Oxford) is right that the probability that we are in a simulation is high enough that we should be somewhat concerned about the risk of simulation shutdown…

— Luke Muehlhauser, director of the Machine Intelligence Research Institute (source)

I bet there’s at least one up-arrow-sized hypergalactic civilization folded into a halting Turing machine with 15 states, or something like that.

— Eliezer Yudkowsky, Complexity and Intelligence

Further reading

Tags: , ,

Why is the material implication of classical logic (also known as material conditional or material consequence), p -> q, defined to be false only when its antecedent (p) is true and the consequent (q) is false? Here is an informal way to think about it.

You could view logic as metamathematics, a language designed to talk about mathematics. Logic as the “hygiene”, the grammar and syntax of mathematics.

In the language of classical logic every proposition is either true or not true, and no proposition can be both true and not true. Now what if we want to express the natural language construction “If…then…” in this language? Well, there are exactly sixteen possible truth functions of two inputs p and q (since there are 2^2 inputs and (2^2)^2 ways to map them to outputs). And the candidate that best captures the connotations of what we mean by “If…then…” is the definition of material implication. Here is why.

By stating that p -> q is true we want to indicate that the truth of q can be inferred from the truth p, but that nothing in particular can be inferred from the falsity of p. And this is exactly the meaning captured by the material conditional:

p q p->q

First, when “If p, q” is true, and we also know that p is true, then we want to be able to infer q. In other words, if we claim that if p is true then q is true, then if p is indeed true, q should be true as well. This basic rule of inference has a name, it is called modus ponens.

Second, if we claim “If p, q”, then if p is false, we did not say anything in particular about q. If p is false, q can either be true or false, our claim “If p, q” is still true.

But notice that it is not possible to capture all notions of what we colloquially mean by “If…then…” statements as a two-valued truth function.

It is for example possible to make meaningless statements such as “If grass is red then the moon if made of cheese.” This is however unproblematic under the assumption that logic is an idealized language, which is adequate for mathematical reasoning. Since we are mainly interested in simplicity and clarity. Under this assumption, such nonsense implications are analogous to grammatically correct but meaningless sentences that can be formed in natural languages, such as “Colorless green ideas sleep furiously“.

To demonstrate its adequacy for mathematics, here is a mathematical example:

If n > 2 then n^2 > 4.

We claim that if n is greater than 2 then its square must be greater than 4. For n = 3, this is obviously true, as we claimed. But what about n smaller than 2? We didn’t say anything in particular about n smaller than 2. Its square could be larger than 4 or not. And indeed, n = 1 and n = -3 yield a false, respectively true, consequent. Yet the implication is true in both cases.

Intuitively more problematic are statements such as (p and not(p)) -> q, p and its negation imply q. Think about it this way. The previous implication is a tautology, it is always true. And you believe true statements. This however does not mean that you must believe that arbitrary q is true too (as long as you stay consistent), since in case of the falsity of the antecedent you are not making any particular claim about the truth of the consequent (q). And since the statement that p is true and false, p AND not(p), is always false — remember the principle of exclusive disjunction for contradictories, (P ∨ ¬P) ∧ ¬(P ∧ ¬P), requires that every proposition is either true or not true, and that no proposition can be both true and not true — q can be false without invalidating the implication.

Another way to look at p -> q is by interpreting it as “p is a subset of q”. Then if it is true that x is an element of p, then it must be true that it is also an element of q (since q contains p). However, if x is not an element p, then it might still turn out to be an element of q, since q can be larger than p.


Here is a term I just learnt: Extraneous solutions.

Take for example the equation

A = B.

If you were to square both sides you would get

A^2 = B^2


A^2 – B^2 = 0.

Which is equal to

(A – B)(A + B) = 0 (by the difference of two squares).

Now the roots of this equation are the roots of the equations A = B and A = -B. This means that we generated an additional solution by squaring the original equation.

The reason for this is that squaring is not an injective fuction (injective means one-to-one, every element is mapped to one and only one unique element), it is not invertible. The function y = x^2 does not pass the horizontal line test. In other words, squaring preserves equality, if A = B then A^2 = B^2, but does not preserve inequality. It is not true that if A != B then A^2 != B^2, since both -1 and 1 are mapped to 1 when squared. Which means that both 1^2 = 1^2 and (-1)^2 = (1)^2 are solutions to the squared equations, while only one of them makes each pre-squared equation true.


Operation Crossroads

Operation Crossroads

Operation Crossroads



Milky Way may bear 100 million life-giving planets

New Obama doctrine on climate change will achieve CO2 emission reductions from the power sector of approximately 30% from CO2 emission levels in 2005.

North Korea as seen from the ISS

North Korea as seen from the ISS

North Korea is really dark. Flying over East Asia, an Expedition 38 crew member on the ISS took this night image of the Korean Peninsula on January 30, 2014.


The math we learn in school can seem like a dull set of rules, laid down by the ancients and not to be questioned. In How Not to Be Wrong, Jordan Ellenberg shows us how wrong this view is: Math touches everything we do, allowing us to see the hidden structures beneath the messy and chaotic surface of our daily lives. It’s a science of not being wrong, worked out through centuries of hard work and argument.



If You Learn Nothing Else about Bayes’ Theorem, Let It Be This

2,302,554,979 BC; Galactic Core – A short story by Yvain about acausal trade. Related to Roko’s basilisk.

Drawing fractal trees and Sierpinski triangles with Python’s turtle graphics module. See also here.

Dangerous Delusions: The Green Movement’s War on Progress


…if you think about it, it doesn’t make any sense. Why would you care more for your genetic siblings and cousins and whoever than for your friends and people who are genuinely close to you? That’s like racism – but even worse, at least racists identify with a group of millions of people instead of a group of half a dozen. Why should parents have to raise children whom they might not even like, who might have been a total accident? Why should people, motivated by guilt, make herculean efforts to “keep in touch” with some nephew or cousin whom they clearly would be perfectly happy to ignore entirely?

Asches to Asches (another “short story” by Yvain).


Ten years from now:

…one widely accepted viewpoint holds that fusion power, artificial intelligence, and interstellar migration will shortly solve all our problems, and therefore we don’t have to change the way we live.


 A hundred years from now:

It has been a difficult century. After more than a dozen major wars, three bad pandemics, widespread famines, and steep worldwide declines in public health and civil order, human population is down to 3 billion and falling.

Continue reading: The Next Ten Billion Years


4 DARPA Projects That Could Be Bigger Than the Internet

3 guys Irish dancing around the world

The decline of Detroit in time-lapse.

Electrical ‘mind control’ shown in primates for first time

What scientists say, and what the public hearSource: What scientists say, and what the public hear


(Inspired by a comment I received here.)

Humanist: You shouldn’t torture humans, they have feelings just like you. And you wouldn’t like to be tortured either!

Sadist: I don’t understand how humanists can advocate thinking about torture. When you think about torture you use the computational architecture between your ears to simulate someone being hurt. And simulations can have feelings too!

Humanist: Oh crap, you’re right! I’ll go torture someone now.


LessWrong from the perspective of a skeptic:

Skeptic: X is a highly conjunctive hypothesis. There’s a lot of hand-waving, and arguments about things that seem true, but “seem true” is a pretty terrible argument.

LW-Member01: This is what we call the “unpacking fallacy” or “conjunction fallacy fallacy”. It is very easy to take any event, including events which have already happened, and make it look very improbable by turning one pathway to it into a large series of conjunctions.

Skeptic: But you are telling a detailed story about the future. You are predicting “the lottery will roll 12345134″, while I merely point out that the negation is more likely.

LW-Member02: Not everyone here is some kind of brainwashed cultist. I am a trained computer scientist, and I held lots of skepticism about MIRI’s claims, so I used my training and education to actually check them.

Skeptic: Fine, could you share your research?

LW-Member02: No, that’s not what I meant!

LW-Member03: Ignore him, Skeptic is a troll!!!

…much later…

Skeptic: I still think this is all highly speculative…

LW-Member03: We’ve already explained to Skeptic why he is wrong. He’s a troll!!!


…let’s have a look at two different models, model 1 and model 2. Model 1 is a model which states that ‘Y = aX’. Model 2 is a model which states that ‘Y = aX + bZ’.

Model 1 assumes b is equal to 0 so that Z is not a relevant variable to include, whereas model 2 assumes b is not zero – but both models make assumptions about this variable ‘Z’ (and the parameter ‘b’).

Source: A little stuff about modelling


New meta-analysis checks the correlation between intelligence and faith

Google will launch a network of 180 internet access satellites but will still develop complementary internet drones and high altitude balloons


Funny screenshot of Eliezer Yudkowsky fighting the basilisk over at the HPMOR reddit:

Eliezer Yudkowsky vs. Roko's basilisk

Eliezer Yudkowsky vs. Roko’s basilisk


Maths Puzzle: Proof that 1 = 0, where is the mistake?

Consider two non-zero numbers x and y such that

x = y.

Then x^2 = xy.

Subtract the same thing from both sides:

x^2 – y^2 = xy – y^2.

Dividing by (x-y), obtain

x + y = y.

Since x = y, we see that

2y = y.

Thus 2 = 1, since we started with y nonzero.

Subtracting 1 from both sides,

1 = 0.

« Older entries