Why you should be afraid of friendly AI

This is a reply to a comment on my last post:

SI is one of a very small number of groups that takes the prospect of an AGI-induced singularity seriously. There’s no more established, FDA-approved alternative. So if I want to help fund a successful AGI singularity then the SI (or comparable organisation) is my only option, right?

Given that you believe that artificial general intelligence (AI / AGI) does by default constitute an existential risk that is of course an important question.

First of all, even if the Singularity Institute (SI / SIAI) was the best choice for anyone trying to mitigate risks associated with AI this wouldn’t invalidate the main argument of my last post. Namely that humans, which are unfriendly by default and also highly fallible, are an underestimated and neglected part of the difficulties in ensuring a “successful AGI singularity”.

Unfriendly Humans

One of the main arguments put forth by SI is that intelligence does not imply benevolence. If SI told me that I shouldn’t be worried because they are smart enough not to lie about their motives, not to delude themselves about morality etc. If they told me that they are intelligent enough to judge their own intelligence and rationality. If they told me that they are smart enough to not only ensure that anyone they might hire is equally intelligent and therefore trustworthy but that their wisdom alone is enough to ensure the beneficent nature of their seed AI, then at best I might call them confused.

Consider the following incident. SI reported $118,803.00 in theft in 2009 resulting in a year end asset balance lower than expected. So? Well, it isn’t much harder to steal code than to steal money from a bank account.

Given the nature of research being conducted by SI, one of the first and most important steps would have to be to think about adequate security measures.

If you believe that risks from AI are to be taken seriously then you should demand that any organisation that studies artificial general intelligence has to establish significant measures against third-party intrusion and industrial espionage that is at least on par with the biosafety level 4 required for work with dangerous and exotic agents.

It might be the case that SI does already employ various measures against the possibility of theft of sensitive information, yet any evidence that hints at the possibility of weak security should be taken seriously. Especially the possibility that there are potentially untrustworthy people who can access critical material should be examined.

And this is the major complication in deciding if to support SI. Because what if they are actually making things worse? If you are a potential donor interested to mitigate risks from AI then before contributing money you will have to make sure that your contribution does not increase those risks even further.

In other words, even if SI is the only organisation seriously concerned with trying to save humanity from unfriendly AI, the cure might be worse than the disease. Which leads me to a point I made before on why you should probably be afraid of friendly AI.

Why you should be afraid of friendly AI

self-improving.software.warning

In his post ‘Fake Utility Functions‘, Eliezer Yudkowsky wrote:

Leave out just one of these values from a superintelligence, and even if you successfully include every other value, you could end up with a hyperexistential catastrophe, a fate worse than death.  If there’s a superintelligence that wants everything for us that we want for ourselves, except the human values relating to controlling your own life and achieving your own goals, that’s one of the oldest dystopias in the book.  (Jack Williamson’s “With Folded Hands”, in this case.)

Based on that line of reasoning, trying to create friendly AI means to create a complex utility function. Which provides a lot of opportunity to fail and create a hell-world.

  • Unfriendly AI -> Ignores humans -> Humans dead
  • Broken Friendly AI -> Humans kept alive -> Humans suffering

Friendly AI is incredible hard and complex. Complex systems can fail in complex ways. Agents that are an effect of evolution have complex values. To satisfy complex values you need to meet complex circumstances. Therefore any attempt at friendly AI, which is incredible complex, is likely to fail in unforeseeable ways. A half-baked, not quite friendly, AI might create a living hell for the rest of time, increasing negative utility dramatically.

It seems to me that figuring out the percentage of hell-world outcomes, in all possible world where SI succeeds to either launch its own seed AI or influence a third-party AI, should be high up on their list of important research.

What is the probability of a flawed realization of human values that nonetheless leaves people happy compared to that of an hell world?

Here is a reply I received from Eliezer Yudkowsky on the above:

There’s not very much brief that I can say about this except “There’s a lot of glance-at-the-predicted-outcome safety-checking regimes that make this less likely if the utility function stays stable at all” and “No, there is not going to be a single sign bit in the utility function that a cosmic ray can hit.” I do think that if you’re trying to increase the ratio of heavens to hells, treating single dead planets as unimportant on larger scales, FAI development a la SIAI is still the best way to go; consider that e.g. there’s also such a thing as uploading, a set of branches that seems unlikely to come first in our own world but which I’d expect to have more heavens, fewer dead planets, more hells, and a somewhat worse heaven/hell ratio.

Judge for yourself if you are satisfied by his reply.

What it takes

If you ask me, I believe that it is going to take a lot of transparency, verifiable security measures and independent third-party inspection to be able to tell if SI is a good choice in reducing existential risks from artificial general intelligence.

I feel deeply uncomfortable about SI’s mission to create friendly AI. Much more so than about the prospect of a hypothetical paperclip maximizer transforming the universe into something devoid of human values and devoid of suffering.

I would feel much more comfortable if there was more transparency and especially some sort of oversight by people not associated with SI. If for example someone like David Pearce, Ben GoertzelDouglas Hofstadter or Holden Karnofsky were able to supervise everything SI does and were equipped with the right to veto certain actions.

Tags: ,