Examining a comment by Eliezer Yudkowsky on AI risks

Here is a reply by Eliezer Yudkowsky to a comment by another user outlining how an AI could be trained to parse natural language correctly. Eliezer Yudkowsky replied that “AIXI-ish devices wipe out their users and take control of their own reward buttons as soon as they can do so safely”. I suggest that you read both comments now.

What is interesting is that Yudkowsky’s comment is currently at +10 upvotes. It is interesting because I am 90% sure that none of the people who upvoted the comment could honestly answer the following questions positively:

(1) Do you know of, and understand, formal proofs of the claims made in Yudkowsky’s comment?

(2) Do you have technical reasons to believe that such a natural language parser would be directly based on AIXI and that the above proofs would remain valid given such a specialized approximation to AIXI?

(3) In the absence of proofs and technical arguments, is your confidence about your comprehension of AIXI, and ability to predict it as the design principle of an eventual artificial general intelligence, high enough to infer action relevant conclusions about the behavior of these hypothetical systems?

(4) Can you be confident that Yudkowsky can answer the previous questions positively and that he is likely to be right, even without being able to verify these claims yourself?


My perception is that people make predictions or hold beliefs about the behavior of highly speculative and hypothetical generally intelligent artificial systems without being able to state any formal or technical justifications. And they are confident enough of these beliefs and predictions to actually give someone money in order to prevent such systems.

To highlight my confusion about that stance, imagine there was no scientific consensus about global warming, no experiments, and no data confirming that global warming actually happens. Suppose that in this counterfactual world there was someone who, lacking almost any reputation as climatologist, is predicting that global warming will cause human extinction. Further suppose that this person was asking for money in order to implement a potentially dangerous and possibly unfeasible geoengineering scheme, in order to stop global warming. Would you give this person your money?

If the answer is negative, what makes you behave differently with respect to risks associated with artificial general intelligence? Do you believe that it is somehow much easier to draw action relevant conclusions about this topic?

Further reading:

Tags: ,

  • rationalnoodles

    I donated to MIRI only to thank Eliezer for HPMoR and don’t care/know what he does with these money. So yes, I would.

  • Alexander Gabriel

    “Do you believe that it is somehow much easier to draw action relevant conclusions about this topic?”

    A Robin Hanson “Growth Mode” point of view says projecting GDP out a few centuries probably means a singularity–meaning some defining parameter of civilization, like world GDP, doubles every month. It does seem a very simple argument.

    There also don’t seem to exist experts here–or I’m not sure who they are. If there were experts I would (probably) defer to them.

    Ideally we would wait for direct evidence of a singularity *before* acting. But that might not be until right before it happened. Right now, we have some weak suggestion there may be a disaster and no way to collect other evidence. The question then is, Do we act? (This is different than, Should we implement Yudkowsky’s specific strategy or not?)

    I’m not very sure. It’s possible there is a simple argument that could persuade me we should do nothing.

    But for now I think action makes sense because the only logic I see weakly suggests that the chance of a singularity in the 21st century is big. All other suggestions I’ve seen, like saying a singularity is too crazy, seem wrong. It’s better to act on a weak but reasonable inference than something that is outright illogical.

    Other thoughts:




  • Pingback: Alexander Kruel · MIRI/LessWrong Critiques: Index()