What’s your numerical probability estimate that, assuming no one puts much work into stopping it, Unfriendly AI will seriously damage or destroy human civilization in the next few centuries?
Mine is…hmm…I don’t know. Maybe 50%? I’m not sure. I do know that if there were an asteroid nearby with the same probability of impacting Earth, I’d be running up to people and shaking them and shouting “WHY AREN’T WE BUILDING MORE ASTEROID DEFLECTORS?! WHAT’S WRONG WITH YOU? PEOPLE!” I don’t know if I believe in unconditional moral imperatives, but if there were a 50% chance of an asteroid striking Earth soon, or even a 10% chance, and no one was doing anything about it, I would at least feel an imperative conditional on some of my other beliefs to try to help stop it.
So maybe part of what the Sequences did for me was help calibrate my brain well enough so that I noticed the similarity between the asteroid and the AI case.
With UFAI people’s estimate are about as divergent as with Second Coming of Jesus Christ, ranging from impossible even in theory through essentially impossible all the way to almost certain.
Risky enough to bother about?
Below are some possible and actual positions people take with respect to AI risks in ascending order of perceived importance:
- Someone should actively think about the issue in their spare time.
- It wouldn’t be a waste of money if someone was paid to think about the issue.
- It would be good to have a periodic conference to evaluate the issue and reassess the risk every year.
- There should be a study group whose sole purpose is to think about the issue.
- All relevant researchers should be made aware of the issue.
- Relevant researchers should be actively cautious and think about the issue.
- There should be an academic task force that actively tries to tackle the issue.
- It should be actively tried to raise money to finance an academic task force to solve the issue.
- The general public should be made aware of the issue to gain public support.
- The issue is of utmost importance. Everyone should consider to contribute money to a group trying to solve the issue.
- Relevant researchers that continue to work in their field, irrespective of any warnings, are actively endangering humanity.
- This is crunch time. This is crunch time for the entire human species. And it’s crunch time not just for us, it’s crunch time for the intergalactic civilization whose existence depends on us. Everyone should contribute all but their minimal living expenses in support of the issue.
Personally, most of the time, I alternate between position 3 and 4.
Asteroids versus unfriendly AI
You might argue that I would endorse position 12 if NASA told me that there was a 20% chance of an extinction sized asteroid hitting Earth and that they need money to deflect it. I would indeed. But that seems like a completely different scenario to me.
I would have to be able to assign more than a 80% probability to AI being an existential risk to endorse that position. I would further have to be highly confident that we will have to face that risk within this century and that the model uncertainty associated with my estimates is low.
That intuition does stem from the fact that any estimates regarding AI risks are very likely to be wrong, whereas in the example case of an asteroid collision one could be much more confident in the 20% estimate. As the latter is based on empirical evidence while the former is inference based and therefore much more error prone.
I don’t think that the evidence allows anyone to take position 12, or even 11, and be even slightly confident about it.
I am also highly skeptical about using the expected value of a galactic civilization to claim otherwise. Because that reasoning will ultimately make you privilege unlikely high-utility outcomes over much more probable theories that are based on empirical evidence.
EXPECTED UTILITY MAXIMIZATION AND COMPLEX VALUES
One of the problems of our current grasp of rationality that is largely unacknowledged are the consequences of expected utility maximization with respect to human nature and our complex values.
Does expected utility maximization has anything to do with being human?
We are still genuinely confused about what a person should do. We don’t even know how much sense that concept makes at all.
Those people who take AI risks seriously and who are currently involved in their mitigation seem to be disregarding many other activities that humans usually deem valuable because the expected utility of saving the world does outweigh the pursuit of other goals.
I do not disagree with that assessment but find it troubling.
The problem is, will there ever be anything but a single goal, a goal that can either be more effectively realized and optimized to yield the most utility or whose associated expected utility simply outweighs all other values?
Assume that humanity managed to create a friendly AI (FAI). Given the enormous amount of resources that each human is poised to consume until the dark era of the universe, wouldn’t the same arguments that now suggest that we should contribute money to existential risk charities then suggest that we should donate our resources to the friendly AI? Our resources could enable it to find a way to either travel back in time, leave the universe or hack the matrix. Anything that could avert the end of the universe and allow the FAI to support many more agents has effectively infinite expected utility.
The sensible decision would be to concentrate on those scenarios with the highest expected utility now, e.g. solving friendly AI, and worry about those problems later. But not only does the same argument always work but the question is also relevant to the nature of friendly AI and our ultimate goals.
Again, is expected utility maximization even compatible with our nature? I don’t think so.
Expected utility maximization does lead to world states in which wireheading is favored, either directly or indirectly by focusing solely on a single high-utility goal that does outweigh all other goals.
This is because if you calculate the expected utility of various outcomes you imagine impossible alternative actions. The alternatives are impossible because you already precommited to choosing the outcome with the largest expected utility. Which has the following effects:
- You swap your complex values for a certain terminal goal with the highest expected utility, indeed your instrumental and terminal goals converge to become the expected utility formula.
- Your decision-making is eventually dominated by extremely small probabilities of obtaining vast utility.
Further consider that any insignificant inferences might exhibit hyperbolic growth in utility:
- There is no minimum amount of empirical evidence necessary to extrapolate the expected utility of an outcome.
- The extrapolation of counterfactual alternatives is unbounded, logical implications can reach out indefinitely without ever requiring new empirical evidence.
Which highlights one of various fundamental problems with our current grasp of rationality.
Taking ideas too seriously
There are people who became very disturbed and depressed taking our current ideas of what is rational too seriously.
The field of formal rationality is relatively new and I believe that we would be well-advised to discount some of its logical implications that advocate extraordinary actions.
Our current methods seem to be biased in new and unexpected ways. Pascal’s mugging, the Lifespan Dilemma and blackmailing are just a few examples on how an agent build according to our current understanding of rationality could fail.
And that’s just what we already know about.
Our current theories are not enough to build an artificial general intelligence that will be reliably in helping us to achieve our values, even if those values could be thoroughly defined or were computable even in principle.
Which raises the following question. If we are unable to build – and wouldn’t trust even if we could – a superhuman agent equipped with our current grasp of rationality to be reliably in extrapolating our volition, how can we trust ourselves to arrive at correct answers given what we know?
We should of course continue to use our best methods to decide what to do. But I believe that we should also draw a line somewhere when it comes to extraordinary implications.
Intuition, Rationality and Extraordinary Implications
It doesn’t feel to me like 3^^^^3 lives are really at stake, even at very tiny probability. I’d sooner question my grasp of “rationality” than give five dollars to a Pascal’s Mugger because I thought it was “rational”.
Holden Karnofsky is suggesting that in some cases we should follow the simple rule that “extraordinary claims require extraordinary evidence”.
I think that we should sometimes demand particular proof P; and if proof P is not available, then we should discount seemingly absurd or undesirable consequences even if our theories disagree.
I am not referring to the weirdness of the conclusions but the foreseeable scope of the consequences of being wrong about them. We should be careful in using the implied scope of certain conclusions to outweigh their low probability. I believe that we should be risk averse in a way that assigns more weight to the consequences of our conclusions being wrong than being right.
As an example take the idea of quantum suicide and ignore for a moment that it does not make sense for the reason that it is reducing your measure.
I wouldn’t commit quantum suicide even given a high confidence in the many-worlds interpretation of quantum mechanics being true. Logical implications just don’t seem enough in some cases.
To be clear, extrapolations work and often are the best we can do. But since there are problems such as the ones mentioned above, that we perceive to be undesirable and that lead to absurd actions and uncomputable consequences, I think it is reasonable to define some upper and lower bounds regarding the use and scope of certain heuristics.
How do we do that? My best guess right now is that we simply have to draw a lot of arbitrary lines and arbitrarily refuse some steps.
Use your intuition
We talk an idealistic talk, but walk a practical walk, and try to avoid walking our talk or talking our walk.
— Robin Hanson, Beware Commitment
Actually we are already doing just that. Even most of those people who claim otherwise. We simply use our intuition, our gut feelings.
We are not going to stop pursuing whatever terminal goal we have chosen just because someone promises us even more utility if we do what that person wants. We are not going to stop loving our girlfriend just because there are other people who do not approve our relationship and who together would experience more happiness if we divorced than the combined happiness of us and our girlfriend being in love.
We informally establish upper and lower bounds all the time.
Taking into account considerations of vast utility or low probability quickly leads to chaos theoretic considerations like the butterfly effect. As a computationally bounded and psychical unstable agents we are unable to cope with that. Consequently there is no other way than to neglect the moral impossibility of extreme uncertainty.
Until the problems are resolved, or rationality is sufficiently established, I believe that we should continue to put vastly more weight on empirical evidence and our intuition than on logical implications, if only because we still lack the necessary insights to trust our comprehension and judgement of the various underlying concepts and methods we used to arrive at those implications in the first place.
Relativity is less wrong than Newtonian mechanics but it still breaks down in describing singularities including the very beginning of the universe.
Bayes’ Theorem, the expected utility formula, and Solomonoff induction are all technically correct. But being able to prove something mathematically doesn’t prove its relation to reality.
It seems to me that our notion of rationality is not the last word on the topic and that we shouldn’t act as if it was.