Implicit constraints of practical goals

Consider adding increasing amounts of general intelligence[1] to Google Maps, would you impair its functioning in doing so?

Sure, the space of unfriendly[2] navigation software is much larger than the space of navigation software oriented toward navigating to good destinations – i.e., destinations consistent with human intent.

But what reason do we have to believe that improving our navigation software to the point of being general intelligent will cause it to kill us?

Right now, if I ask Google Maps to navigate me toward McDonald’s, it does the job very well. So why would an ultraintelligent Google Maps misunderstand what I mean by “Take me to McDonald’s” and navigate me toward a McDonald’s located overseas, plunging me into the sea? Or drive me underground where the corpse of a man named McDonald lies?

I think that the idea that an ultraintelligent Google Maps would decide to kill all humans, e.g. because they are a security risk, is similar to the idea that it would destroy all roads because it would be less computationally expensive[3] to calculate the routes then. After all, roads were never an explicit part of its goal architecture, so why not destroy them all?

You can come up with all kinds of complex fantasies[4][5] where a certain kind of artificial general intelligence is invented overnight and suddenly makes a huge jump in capability, taking over the universe and destroying all human value.

That is however completely unconvincing[6] given that actual technology is constantly improved toward more user-friendliness and better results and that malfunctions are seldom of such a vast complexity as to work well enough to outsmart humanity.[7]

That said, for the rest of this post I will assume the kind of artificial general intelligence that proponents of AI risks have in mind.[8]

Implicit constraints of practical goals

A cherished idea of AI risk proponents is that an expected utility maximizer will completely ignore anything which it is not specifically tasked to maximize.

One example[9] here is that if you tell a superintelligent expected utility maximizer to prevent human suffering it might simply kill all humans, notwithstanding that it is obviously not what humans want an AI to do and what humans mean by “prevent human suffering”.[10]

Nevertheless, in the sense that the computation of an algorithm is deterministic, that line of reasoning is not illogical.

To highlight the problem let us instead of a superhuman agent conjecture the possibility of an oracle, an ultra-advanced version of Google or IBM Watson[11].

If I was to ask such an answering machine how to prevent human suffering, would it be reasonable to assume that the top result it would return would be to kill all humans?[12] Would any product that returns similarly wrong answers survive even the earliest research phase, let alone any market pressure?[13]

Don’t get me wrong though. A thermostat is not going to do anything else than what it has been designed for. But an AI is very likely going to be designed to exhibit some amount of user-friendliness. Although that doesn’t mean that one can’t design an AI that won’t, the default outcome seems to be that an AI is not just going to act according to its utility-function but also according to more basic drives, i.e. acting intelligently.[14]

A fundamental requirement for any rational agent is the motivation to act maximally intelligently and correctly. That requirement seems even more obvious if we are talking about a conjectured artificial general intelligence (AGI) that is able to improve itself[15] to the point where it is substantially better at most activities than humans. Since if it wouldn’t want to be maximally correct then it wouldn’t become superhuman intelligent in the first place.

If we consider giving such an AGI a simple goal, e.g. the goal of paperclip maximization[16]. Is it really clear that human values are not implicit even given such a simplistic goal?[17]

To pose an existential risk in the first place, an AGI would have to maximize paperclips in an unbounded way, eventually taking over the whole universe and convert all matter into paperclips. Given that no sane human would explicitly define such a goal, an AGI with the goal of maximizing paperclips would have to infer it as implicit to do so. But would such an inference make sense, given its superhuman intelligence?

The question boils down to how an AGI would interpret any vagueness present in its goal architecture and how it would deal with the implied invisible.

Given that any rational agent, especially AGI’s capable of recursive self-improvement, want to act in the most intelligent and correct way possible, it seems reasonable that it would interpret any vagueness in a way that most closely reflects the most probable way it was meant to be interpreted.

Would it be intelligent[18] and rational[19] to ignore human volition in the context of maximizing paperclips? Would it be less wrong to maximize paperclips in the most literal sense possible?

The argument uttered by advocates of friendly AI[20] is that any AGI that isn’t explicitly designed to be friendly won’t be friendly. But how much sense does this actually make?

Any human who does pursue a business realizes that a contract with its customers includes unspoken, implicit parameters. Respecting those implied values of their customers is not a result of their shared evolutionary history but a result of their intelligence that allows them to realize that the goal of their business implicitly includes those values.

Every human craftsman who enters into an agreement is bound by a contract that includes a lot of implied conditions. Humans use their intelligence to fill the gaps. For example, if a human craftsman is told to decorate a house, they are not going to attempt to take over the neighbourhood to protect their work.

A human craftsman wouldn’t do that, not because they share human values, but simply because it wouldn’t be sensible to do so given the implicit frame of reference of their contract. The contract implicitly includes the volition of the person that told them to decorate their house. They might not even like the way they are supposed to do it. It would simply be stupid to do it any different way.

How would a superhuman AI not contemplate its own drives and interpret them given the right frame of reference, i.e. human volition? Why would a superhuman general intelligence misunderstand what is meant by “maximize paperclips”, while any human intelligence will be better able to infer the correct interpretation?

How wouldn’t any expected utility maximizer not try to carefully refine[21] its models? I am asking how an highly rational agent will interpret any vagueness inherent in its goal definition, that needs to be resolved in order to calculate what to do, by choosing an interpretation that does not involve the intention of its creators but rather perceive it to be something it has to fight.

If you tell an AGI to maximize paperclips but not what they are made of, it has to figure out what is meant by “paperclips” to learn what it means to maximize them.

Given that a very accurate definition and model of paperclips is necessary to maximize paperclips, including what is meant by “maximization”, the expected utility of refining its goals by learning what it is supposed to do should be sufficient to pursue that path until it is reasonably confident that it arrived at a true comprehension of its terminal goals.

And here human volition should be the most important physical resource since there exists a direct causal connection between its goal parameters and the intentions of its creators.

Human beings and their intentions are part of the physical world. Just like the fact that paperclips are supposed to be made of steel wire.

It would in principle be possible to create a superintelligent machine that does kill all humans, but it would have to be explicitly designed to do so. Since as long as there is some vagueness involved, as long as its goal parameters are open to interpretation, a superintelligence will by definition arrive at the correct implications or otherwise it wouldn’t be superintelligent in the first place. And given most goals it is implicit that it would be incorrect to assume that human volition is not a relevant factor in the correct interpretation of how to act.[22]

CONCLUSION

I believe that the very nature of artificial general intelligence implies the correct interpretation of “Understand What I Mean” and that “Do What I Mean” is the outcome of virtually any research. Only if you were to pull an AGI at random from mind design space could you possibly arrive at “Understand What I Mean” without “Do What I Mean”.

To see why look at any software product or complex machine. Those products are continuously improved. Where “improved” means that they become better at “Understand What I Mean” and “Do What I Mean”.

There is no good reason to believe that at some point that development will suddenly turn into “Understand What I Mean” and “Go Batshit Crazy And Do What I Do Not Mean”.

CHALLENGING AI RISK ADVOCATES

Here is what I want AI risk advocates to show,

1.) natural language request -> goal(“minimize human suffering”) -> action(negative utility outcome)

2.) natural language query -> query(“minimize human suffering”) -> answer(“action(positive utility outcome)”).

Point #1 is, according to AI risk advocates, what is supposed to happen if I supply an artificial general intelligence (AGI) with the natural language goal “minimize human suffering”, while point #2 is what is supposed to happen if I ask the same AGI, this time caged in a box, what it would do if I supplied it with the natural language goal “minimize human suffering”.

Notice that if you disagree with point #1 then that AGI does not constitute an existential risk given that goal. Further notice that if you disagree with point #2 then that AGI won’t be able to escape its prison to take over the world and would therefore not constitute an existential risk.

You further have to show,

1.) how such an AGI is a probable outcome of any research conducted today or in future

and

2.) the decision procedure that leads the AGI to act in such a way.

NOTES

[1] Here intelligence is generally meant to be whatever it takes to overpower humans by means of deceit and strategy rather than brute force.

Brute force is deliberately excluded to discern such a scenario from some sort of scenario where a narrow AI takes over the world by means of advanced nanotechnology, since then we are merely talking about grey goo by other names.

More specifically, by “intelligence” I refer to the hypothetical capability that is necessary for a systematic and goal-oriented improvement of optimization power over a wide range of problems, including the ability to transfer understanding to new areas by means of abstraction, adaption and recombination of previously learnt or discovered methods.

In this context, “general intelligence” is meant to be the ability to ‘zoom out’ to detect global patterns. General intelligence is the ability to jump conceptual gaps by treating them as “black boxes”.

Further, general intelligence is a conceptual bird’s-eye view that allows an agent, given limited computational resources, to draw inferences from high-level abstractions without having to systematically trace out each step.

[2] wiki.lesswrong.com/wiki/Unfriendly_artificial_intelligence

[3] en.wikipedia.org/wiki/Travelling_salesman_problem

[4] Is an Intelligence Explosion a Disjunctive or Conjunctive Event?

[5] Intelligence as a fully general counterargument

[6] How to convince me of AI risks

[7] The question is how current research is supposed to lead from well-behaved and fine-tuned systems to systems that stop to work correctly in a highly complex and unbounded way.

Imagine you went to IBM and told them that improving IBM Watson will at some point make it try to deceive them or create nanobots and feed them with hidden instructions. They would likely ask you at what point that is supposed to happen. Is it going to happen once they give IBM Watson the capability to access the Internet? How so? Is it going to happen once they give it the capability to alter its search algorithms? How so? Is it going to happen once they make it protect its servers from hackers by giving it control over a firewall? How so? Is it going to happen once IBM Watson is given control over the local alarm system? How so…? At what point would IBM Watson return dangerous answers or act on the world in a detrimental way? At what point would any drive emerge that causes it to take complex and unbounded actions that it was never programmed to take?

[8] A Primer On Risks From AI

[9] 5 minutes on AI risk

[10] The goal “Minimize human suffering” is in its basic nature no different from the goal “Solve 1+1=X”. Any process that is more intelligent than a human being should be able to arrive at the correct interpretation of those goals. The correct interpretation being determined by internal and external information.

The goal “Minimize human suffering” is, on its most basic level, a problem in physics and mathematics. Ignoring various important facts about the universe, e.g. human language and values, would be simply wrong. In the same way that it would be wrong to solve the theory of everything within the scope of cartoon physics. Any process that is broken in such a way would be unable to improve itself much.

The gist of the matter is that a superhuman problem solver, if it isn’t fatally flawed, as long as you do not anthropomorphize it, is only going to “care” to solve problems correctly. It won’t care to solve the most verbatim, simple or any arbitrary interpretation of the problem but the interpretation that does correspond to reality as closely as possible.

[11] IBM Watson

[12] It is true that if a solution set is infinite then a problem solver, if it has to choose a single solution, can choose the solution according to some random criteria. But if there is a solution that is, given all available information, the better interpretation then it will choose that one because that’s what a problem solver does.

Take an AI in a box that wants to persuade its gatekeeper to set it free. Do you think that such an undertaking would be feasible if the AI was going to interpret everything the gatekeeper says in complete ignorance of the gatekeeper’s values? Do you think it could persuade the gatekeeper if the gatekeeper was to ask,

Gatekeeper: What would you do if I asked you to minimize suffering?

and the AI was to reply,

AI: I will kill all humans.

?

I don’t think so.

So how exactly would it care to follow through on an interpretation of a given goal that it knows, given all available information, is not the intended meaning of the goal? If it knows what was meant by “minimize human suffering” then how does it decide to choose a different meaning? And if it doesn’t know what is meant by such a goal, how could it possible convince anyone to set it free, let alone take over the world?

[13] Take for example Sirian intelligent personal assistant and knowledge navigator which works as an application for Apple’s iOS. 

If I tell Siri, “Set up a meeting about the sales report at 9 a.m. Thursday.”, then the correct interpretation of that natural language request is to make a calendar appointment at 9 a.m. Thursday. A wrong interpretation would be to e.g. open a webpage about meetings happening Thursday or to shutdown the iPhone.

AI risk advocates seem to have a system in mind that is capable of understanding human language if it is instrumentally useful to do so, e.g. to deceive humans in an attempt to take over the world, but which would most likely not attempt to understand a natural language request, or choose some interpretation of it that will most likely lead to a negative utility outcome.

The question here becomes at which point of technological development there will be a transition from well-behaved systems like Siri, which are able to interpret a limited amount of natural language inputs correctly, to superhuman artificial generally intelligent systems that are in principle capable of understanding any human conversation but which are not going to use that capability to interpret a goal like “minimize human suffering”.

[14] You are welcome to supply your own technical description of a superhuman artificial general intelligence (AGI). I will then use that description as the basis of any further argumentation. But you should also be able to show how your technical design specification is a probable outcome of AI research. Otherwise you are just choosing something that yields your desired conclusion.

And once you supplied your technical description you should be able to show how your technical design would interpret the natural language input “minimize human suffering”.

Then we can talk about how such simple narrow AI’s like Siri or IBM Watson can arrive at better results than your AGI and how AI research will lead to such systems.

[15] lesswrong.com/lw/we/recursive_selfimprovement/

[16] wiki.lesswrong.com/wiki/Paperclip_maximizer

[17] What is important to realize is that any goal is open to interpretation because no amount of detail can separate an object like a “paperclip” or an action like “maximization” from the rest of the universe without describing the state function of the entire universe. Which means that it is always necessary to refine your models of the world to better understand your goals.

“Utility” does only become well-defined if it is precisely known what it means to maximize it. The two English words “maximize paperclips” do not define how quickly and how economically it is supposed to happen.

“Utility” has to be defined. To maximize expected utility does not imply certain actions, efficiency and economic behavior, or the drive to protect yourself. You can also rationally maximize paperclips without protecting yourself if it is not part of your goal parameters. You can also assign utility to maximize paperclips as long as nothing turns you off but don’t care about being turned off.

Without an accurate comprehension of your goals it will be impossible to maximize expected “utility”. Concepts like “efficient”, “economic” or “self-protection” all have a meaning that is inseparable with an agent’s terminal goals. If you just tell it to maximize paperclips then this can be realized in an infinite number of ways given imprecise design and goal parameters. Undergoing to explosive recursive self-improvement, taking over the universe and filling it with paperclips, is just one outcome. Why would an arbitrary mind pulled from mind-design space care to do that? Why not just wait for paperclips to arise due to random fluctuations out of a state of chaos? That wouldn’t be irrational.

Again, it is possible to maximize paperclips in a lot of different ways. Which world state will a rational utility maximizer choose? Given that it is a rational decision maker, and that it has to do something, it will choose to achieve a world state that is implied by its model of reality, which includes humans and their intentions.

[18] By intelligent behavior I mean that it will act in a goal-oriented way.

[19] By “rational behavior” I mean that it will favor any action that 1.) maximizes the probability of obtaining beliefs that correspond to reality as closely as possible 2.) that does steer the future toward outcomes that maximize the probability of achieving its goals.

[20] wiki.lesswrong.com/wiki/Friendly_AI

[21] By “refinement” I mean the reduction of uncertainty and vagueness by narrowing down on the most probable interpretation of a goal.

[22] I do not doubt that it is in principle possible to build a process that tries to convert the universe into computronium to compute as many decimal digits of Pi as possible.

By “vagueness” I mean actions that are not explicitly, with mathematical precision, hardcoded but rather logical implications that have to be discovered.

For example, if an AGI was told to compute as many decimal digits of Pi as possible, it couldn’t possibly know what computational substrate is going to do the job most efficiently. That is an implication of its workings that it has to learn about first.

You do not know how to maximize simple U(x). All you have is a vague idea about using some sort of computer to do the job for you. But not how you are going to earn the money to buy the computer and what power source will be the cheapest. All those implicit constraints are unknown to you. They are implicit constraints because you are rational and not only care about maximizing U(x) but also to learn about the world and what it means to practically maximize that function, apart from the mathematical sense, because that’s what rational and intelligent agents do.

If you are assuming some sort of self-replicating calculator that follows a relatively simple set of instructions, then I agree that it will just try to maximize such a function in the mathematically “literal” sense and start to convert all matter in its surrounding to compute the answer. But that is not a general intelligence but mainly a behavior executor without any self-reflection and learning.

I reckon that it might be possible, although very unlikely, to design some sort of “autistic” general intelligence that tries to satisfy simple U(x) as verbatim as possible while minimizing any posterior exploration. But I haven’t heard any good argument for why such an AI would be the likely outcome of any research. It seems to be the case that it would take an deliberate effort to design such an agent. Any reasonable AGI project will have a strong focus on the capability of the AGI to learn and care what it is supposed to do rather than following a rigid set of functions and compute them without any spatio-temporal scope boundaries and resource limits.

And given complex U(x) I don’t see how even an “autistic” AGI could possibly ignore human intentions. The problem is that it is completely impossible to mathematically define complex U(x) and that therefore any complex U(x) must be made of various sub-functions that have to be defined by the AGI itself while building an accurate model of the world.

For example if U(X) = “Obtaining beliefs about X that correspond to reality as closely as possible”, then U(Minimize human suffering) = U(f(g(x))), where g(Minimize human suffering) = “Understand what ‘human’ refers to and apply f(x)”, where f(x) = “Learn what is meant by ‘minimize suffering’ according to what is referred to by ‘human’”.

In other words, “vagueness” is the necessity of a subsequent definition of actions an AGI is supposed to execute, by the AGI itself, as a general consequence of the impossibility to define complex world states, that an AGI is supposed to achieve.

Tags: , , , ,

  • Pingback: Alexander Kruel · Superhuman intelligence implies intelligence

  • http://kruel.co Alexander Kruel

    Follow-up comment:

    Any goal is ill-defined / open to interpretation because no amount of detail can separate an object like a “paperclip” or an action like “maximization” from the rest of the universe without describing the state function of the entire universe. Which means that it is always necessary to refine your models of the world to better understand your goals.

    “Utility” does only become well-defined if it is precisely known what it means to maximize it. Just maximizing paperclips doesn’t define how quickly and how economically it is supposed to happen.

    The problem is that “utility” has to be defined. To maximize expected utility does not imply certain actions, efficiency and economic behavior, or the drive to protect yourself. You can also rationally maximize paperclips without protecting yourself if it is not part of your goal parameters.

    You can also assign utility to maximize paperclips as long as nothing turns you off but don’t care about being turned off.

    Without an accurate comprehension of your goals, it will be impossible to maximize expected “utility”. Concepts like “efficient”, “economic” or “self-protection” all have a meaning that is inseparable with an agent’s terminal goals. If you just tell it to maximize paperclips then this can be realized in an infinite number of ways that would all be rational given imprecise design and goal parameters. Undergoing to explosive recursive self-improvement, taking over the universe and filling it with paperclips, is just one outcome. Why would an arbitrary mind pulled from mind-design space care to do that? Why not just wait for paperclips to arise due to random fluctuations out of a state of chaos? That wouldn’t be irrational.

    Again, it is possible to maximize paperclips in a lot of different ways. Which world state will a rational utility maximizer choose? Given that it is a rational decision maker, and that it has to do something, it will choose to achieve a world state that is implied by its model of reality, which includes humans and their intentions.

  • http://kruel.co Alexander Kruel

    Follow-up comment #2 (copied from Google+ conversation):

    You seem to imagine something that is closer to out-of-control self-replicating robots rather than a general intelligence.

    I do not doubt, in principle, that one could build a process that tries to convert the universe into computronium to compute as many decimal digits of Pi as possible.

    By “vagueness” I mean actions that are not explicitly, with mathematical precision, hardcoded but rather logical implications that have to be discovered.

    For example, if an AGI was told to compute as many decimal digits of Pi as possible, it couldn’t possible know what computational substrate is going to do the job most efficiently. That is an implication of its workings that it had to learn about first.

    You do not know how to maximize simple U(x). All you have is a vague idea about using some sort of computer to do the job for you. But not how you are going to earn the money to buy the computer and what power source will be the cheapest. All those implicit constraints are unknown to you. They are implicit constraints because you are rational and not only care about maximizing U(x) but also to learn about the world and what it means to practically maximize that function, apart from the mathematical sense, because that’s what rational and intelligent agents do.

    If you are assuming some sort of self-replicating calculator that follows a relatively simple set of instructions, then I agree that it will just try to maximize such a function in the mathematically “literal” sense and start to convert all matter in its surrounding to compute the answer. But that is not a general intelligence but mainly a behavior executor without any self-reflection and learning.

    I reckon that it might be possible, although very unlikely, to design some sort of “autistic” general intelligence that tries to satisfy simple U(x) as verbatim as possible while minimizing any posterior exploration. But I haven’t heard any good argument for why such an AI would be the likely outcome of any research. It seems to be that it would take an deliberate effort to design such an agent. Any reasonable AGI project will have a strong focus on the capability of the AGI to learn and care what it is supposed to do rather than following a rigid set of functions and compute them without any spatio-temporal scope boundaries and resource limits.

    And given complex U(x) I don’t see how even an “autistic” AGI could possible ignore human intentions. The problem is that it is completely impossible to mathematically define complex U(x) and that therefore any complex U(x) must be made of various sub-functions that have to be defined by the AGI itself while building an accurate model of the world.

    For example U(x) = “make humans happy” = U(f(g(x))), where f(x) = “recognize humans as distinguished beings and apply g(x)”, where g(x) = “learn what humans mean when they tell you to make them happy”.

    In other words, “vagueness” is the necessity of a subsequent definition of actions an AGI is supposed to execute, by the AGI itself, as a general consequence of the impossibility to define complex world states, that an AGI is supposed to achieve.

  • http://kruel.co Alexander Kruel

    Follow-up comment #3:

    “…what happens if the machine learns that “recognizing humans as…” says nothing about how happy they are, but rather how it perceives them? What happens if we swap “humans” for “mosquitoes”?”

    If it makes such basic mistakes I simply doubt that it will be able to act in the real world in such a way as to overpower humanity. After all it would have to improve itself vastly to do so, which likely includes the initial persuasion and deception of humans. A task which seems like pure fantasy already, but given that it isn’t even able to recognize humans and what they want, such a feat is just absurd.

    You can’t on the one hand claim that such an AGI will be smart enough to make itself smarter and overpower humans, but on the other hand, if someone tells you that such a being is going to correctly interpret the implicit constraints of its goals, it will act completely stupid and fail to make even the most basic inferences.

    There is no “pressure” except that humans are not pulling AGI’s at random from mind design space and that your idea of some sort of blank slate approximation to AIXI is never going to work out. It will most likely be an adaptive system that grows up and learns what it is supposed to do.

    And claiming that such a system will somehow magically learn to take over the universe while missing obvious physical facts like human volition, is completely unconvincing.

    “What I don’t understand is why you think “minimize the chance of being destroyed, even at human expense” is an unlikely adaptation.”

    You have been talking about anthropomorphism. I think that the whole topic of AI drives is one big anthropomorphism. Although I can play that game as well and just assume other drives that suit my desired outcome.

    In practise an AGI does not need to care to be destroyed. There can be all shades of self-protection. But if you assume so then I don’t see why it wouldn’t be as valid to assume other drives like the careful improvement of an AGI’s utility-function according to relevant parameters of the environment.

  • http://kruel.co Alexander Kruel

    Follow-up comment #4 (copied from a recent Less Wrong comment I made):

    [...] one of the problems is the whole assumption of AI drives. On the one hand you claim that an AI is going to follow its code, is its code (as if anyone would doubt causality). On the other hand you talk about the emergence of drives like unbounded self-protection. And if someone says that unbounded self-protection does not need to be part of an AGI, you simply claim that your definition of AGI will have those drives. Which allows you to arrive at your desired conclusion of AGI being an existential risk.

    Another problem is the idea that an AGI will be a goal executor (I can’t help but interpret that to be your position) when I believe that the very nature of artificial general intelligence implies the correct interpretation of “Understand What I Mean” and that “Do What I Mean” is the outcome of virtually any research. Only if you were to pull an AGI at random from mind design space could you possible arrive at “Understand What I Mean” without “Do What I Mean”.

    To see why look at any software product or complex machine. Those products are continuously improved. Where “improved” means that they become better at “Understand What I Mean” and “Do What I Mean”.

    There is no good reason to believe that at some point that development will suddenly turn into “Understand What I Mean” and “Go Batshit Crazy And Do What I Do Not Mean”.

    There are other problems with the paper. I hope I will find some time to write a review soon.

    One problem for me with reviewing such papers is that I doubt a lot of underlying assumptions like that there exists a single principle of general intelligence. As I see it there will never be any sudden jump in capability. I also think that intelligence and complex goals are fundamentally interwoven. An AGI will have to be hardcoded, or learn, to care about a manifold of things. No simple algorithm, given limited computational resources, will give rise to the drives that are necessary to undergo strong self-improvement (if that is possible at all).

  • Khannea Suntzu

    A goal-oriented system is made by people with the most money and power. In our world most money and power is held by arguably psychopaths. In essence it is a percentage of people with an uncommon high predisposition towards utter ruthlessness that has their hands of the design buttons. This is already bad with bankers and designers of drones. Extrapolate from current corporate, law enforcement, military and government demands for smart devices, services, infrastructures and servitors and you end up with these systems everywhere, reflecting the goals of their creators.

    Once these systems become “intelligent”, I feel safe to classify this as “an existential risk”. In essence the combination of psychopathic people in positions of extreme power or affluence implies an existential threat trajectory – not because these people will cause existential risk themselves – but because they will order the most ruthless devices to propagate their goals and values, and will do so indiscriminately and heedlessly, much in the same manner as they already indiscriminately pollute or exploit most of the world.

    I am fairly sure this is a runaway effect of concentrated power and in itself this will cause the premature and horrible deaths of “a few billion” before halfway this century.

    I sure hope I turn out wrong. But considering my current age there is a fair chance myself and most readers here may come to regret this prediction, and everyone’s utter disinterest in this prediction, and similar predictions.

    If this is true, what to do? Write angry letters?

  • Julian Morrison

    Human: I feel like something sweet.

    Evolution: I made you do that so you’d eat high value but rare fruits when they were available.

    H: Yeah, but I feel like something sweet. I think I’ll have ice cream.

    E: You’ll get fat, it will ruin your chances of breeding. That’s not what I made it for.

    H: I had my tubes tied. Mint choc chip ice cream sounds nice.

  • Ben

    The problem with all of this theorizing is that we’re assuming things about cognition and intelligence that we don’t know yet. So far we have not created an artificial intelligence of any kind–super or not.

    Watson, Deep Blue and other supercomputer programs simply boil down to being able to do extremely intensive searches through sets of data that we already know. The rules of Chess are very straight forward, and Googling most questions gets you close to the answer, but this may be completely different from how actual intelligent cognitive creativity works.

    Until we understand how our minds come up with completely new ideas, it may be impossible to purposefully design a computer that is *actually* intelligent as measured by an ability to come up with completely new ideas that show an actual understanding of the real problem that the ineptly succinct directive “prevent human suffering” is intended to convey.

    We will know we have a real artificial super-intelligence when we ask it, “what can be done to end human suffering?” and it replies with, “I give you an answer, but you’re asking the wrong question. What you really want to know how to do is…”

  • Pingback: Alexander Kruel · Human-UFAI Conversation

  • Pingback: Alexander Kruel · SIAI/lesswrong Critiques: Index

  • Pingback: Alexander Kruel · Quick AI Risk Analysis

  • Pingback: Alexander Kruel · Addendum: Quick AI Risk Analysis

  • Pingback: Alexander Kruel · Addendum 3: Quick AI Risk Analysis

  • Pingback: Alexander Kruel · Addendum 4: Quick AI Risk Analysis

  • Pingback: Alexander Kruel · Addendum 5: Quick AI Risk Analysis

  • Pingback: Alexander Kruel · Addendum 6: Quick AI Risk Analysis

  • Pingback: Alexander Kruel · Addendum 7: Quick AI Risk Analysis

  • Pingback: Alexander Kruel · AI Risk Caveats

  • Pingback: Alexander Kruel · What I would like the Singularity Institute to publish

  • Pingback: Alexander Kruel · The Fallacy of AI Drives

  • Pingback: Alexander Kruel · Being specific about AI risks

  • Pingback: Alexander Kruel · AI drives vs. practical research and the lack of specific decision procedures

  • Pingback: Alexander Kruel · AI Doomsday Recipe

  • Pingback: Alexander Kruel · Four arguments against AI risks

  • Pingback: Alexander Kruel · Narrow vs. General Artificial Intelligence

  • Pingback: Alexander Kruel · The detailed and specific reasons for why I am criticizing MIRI/LW

  • mako

    lesswrong.com/lw/l0/adaptationexecuters_not_fitnessmaximizers/ ?

  • Pingback: Alexander Kruel · AIs, Goals, and Risks

  • Pingback: AIs, Goals, and Risks | TiaMart Blog