LessWrong user RobBB posted what he calls a mixtape of blog posts to introduce people to the dangers of artificial superintelligence (short: AI risk).
For my own introduction to AI risk see here.
(1) Power of Intelligence, (9) Plenty of Room Above Us
Response: (1) superhuman intelligence is not the same as superapish intelligence (2) it is far from clear that intelligence is a decisive factor in a war between AI and humanity (3) current AI is pathetic and far from human-level AI.
(2) Ghosts in the Machine, (11) Basic AI drives
Response: People read my posts about how AI is much less of a risk than other people want them to believe and say – this is one of the top three initial reactions:
“But according to Omohundro there will be certain AI Drives which will cause human extinction, no matter what goal the AI has.”
And where would these drives come from? Terminal and instrumental goals are orthogonal. An artificial intelligence can have any combination of terminal goals and instrumental goals. In other words, more or less any terminal goal implies infinitely many sets of instrumental goals.
There is this way of imagining that an AI will be pulled at random from mind design space. How real world AI is developed, and that virtually all AI is constantly improved to be better at understanding and doing what humans want, is being ignored.
AI is much harder than people instinctively imagined, exactly because there is no relevant difference between goals and capabilities in artificial intelligence. To beat humans you have to define “winning”.
This doesn’t mean you program in every decision explicitly. Any general intelligence will have to be able to hit very small targets in large and unstructured spaces. Any superhuman AI will eventually be better at understanding what humans want it to do than humans themselves. AI risk advocates in turn base their ideas on what can be called the fallacy of dumb superintelligence.
Response: Either general intelligence requires one conceptual breakthrough or many small incremental breakthroughs. And I don’t know of any good reason to believe that e.g. the ability to generate novel and useful mathematics can be captured by a set of rules that are both simple and efficient.
What is useful and interesting depends on the context. In other words, the context defines what constitutes winning. And since you cannot guess the context, you won’t be able to implement a simple and efficient rule that outputs <success> given any arbitrary context.
(4) Adaptation-Executers, not Fitness-Maximizers
Response: I wasted time reading this post.
Response: Any behavior-executor can be framed as a utility-maximizer and vice versa. Your robot will only try to prevent you from messing with it if you programmed it to do so. In other words, no AI is going to be an existential risk as long as you did not explicitly made it one.
(6) Optimization and the Singularity, (7) Efficient Cross-Domain Optimization
Response: Evolution was able to come up with cats. Cats are immensely complex objects. Evolution did not intend to create cats. Now consider you wanted to create an expected utility maximizer to accomplish something similar, except that it would be goal-directed, think ahead, and jump fitness gaps. Further suppose that you wanted your AI to create qucks, instead of cats. How would it do this?
Given that your AI is not supposed to search design space at random, but rather look for something particular, you would have to define what exactly qucks are. The problem is that defining what a quck is, is the hardest part. And since nobody has any idea what a quck is, nobody can design a quck creator.
The point is that thinking about the optimization of optimization is misleading, as most of the difficulty is with defining what to optimize, rather than figuring out how to optimize it. In other words, the efficiency of e.g. the scientific method depends critically on being able to formulate a specific hypothesis.
Trying to create an optimization optimizer would be akin to creating an autonomous car to find the shortest route between Gotham City and Atlantis. The problem is not how to get your AI to calculate a route, or optimize how to calculate such a route, but rather that the problem is not well-defined. You have no idea what it means to travel between two fictional cities. Which in turn means that you have no idea what optimization even means in this context, let alone meta-level optimization.
Humans in turn receive constant feedback on what to optimize by a cultural and evolutionary process. There is no simple way to automate that.
(8) The Design Space of Minds-In-General
Response: The only relevant AIs are those which are designed by humans. And such AIs should be expected to be better at doing what humans want, because they are the improved successors of previous generations of AIs which were doing what humans wanted. For more on this, see here.
(10) The True Prisoner’s Dilemma
Response: I do not have the time and background knowledge to comment on any possible relation to AI risks at this point in time.
Response: I did not read the post since it did not seem to be relevant, and I already wasted more time on this than I now feel comfortable about.
(13) The Hidden Complexity of Wishes (14) Magical Categories
Response: Take an AI in a box that wants to persuade its gatekeeper to set it free. Do you think that such an undertaking would be feasible if the AI was going to interpret everything the gatekeeper says in complete ignorance of the gatekeeper’s values? Do you believe that the following scenario could persuade the gatekeeper:
Gatekeeper: What would you do if I asked you to minimize suffering?
AI: I will kill all humans.
I don’t think so.
So how exactly would it care to follow through on an interpretation of a given goal that it knows, given all available information, is not the intended meaning of the goal? If it knows what was meant by “minimize human suffering” then how does it decide to choose a different meaning? And if it doesn’t know what is meant by such a goal, how could it possible convince anyone to set it free, let alone take over the world?
Here is what I want AI risk advocates to show,
(1) natural language request -> goal(“minimize human suffering”) -> action(negative utility outcome)
(2) natural language query -> query(“minimize human suffering”) -> answer(“action(positive utility outcome)”).
Point #1 is, according to AI risk advocates, what is supposed to happen if I supply an artificial general intelligence (AGI) with the natural language goal “minimize human suffering”, while point #2 is what is supposed to happen if I ask the same AGI, this time caged in a box, what it would do if I supplied it with the natural language goal “minimize human suffering”.
Notice that if you disagree with point #1 then that AGI does not constitute an existential risk given that goal. Further notice that if you disagree with point #2, then that AGI won’t be able to escape its prison to take over the world and would therefore not constitute an existential risk.
You further have to show,
(1) how such an AGI is a probable outcome of any research conducted today or in future
and
(2) the decision procedure that leads the AGI to act in such a way.
(15-20) …
Response: I am not going to read posts 15-20 because the previous posts were already unconvincing and I don’t expect those other posts to make any difference. I also have better things to do.