artificial general intelligence

You are currently browsing articles tagged artificial general intelligence.

LessWrong user RobBB posted what he calls a mixtape of blog posts to introduce people to the dangers of artificial superintelligence (short: AI risk).

For my own introduction to AI risk see here.

(1) Power of Intelligence, (9) Plenty of Room Above Us

Response: (1) superhuman intelligence is not the same as superapish intelligence (2) it is far from clear that intelligence is a decisive factor in a war between AI and humanity (3) current AI is pathetic and far from human-level AI.

(2) Ghosts in the Machine, (11) Basic AI drives

Response: People read my posts about how AI is much less of a risk than other people want them to believe and say – this is one of the top three initial reactions:

“But according to Omohundro there will be certain AI Drives which will cause human extinction, no matter what goal the AI has.”

And where would these drives come from? Terminal and instrumental goals are orthogonal. An artificial intelligence can have any combination of terminal goals and instrumental goals. In other words, more or less any terminal goal implies infinitely many sets of instrumental goals.

There is this way of imagining that an AI will be pulled at random from mind design space. How real world AI is developed, and that virtually all AI is constantly improved to be better at understanding and doing what humans want, is being ignored.

AI is much harder than people instinctively imagined, exactly because there is no relevant difference between goals and capabilities in artificial intelligence. To beat humans you have to define “winning”.

This doesn’t mean you program in every decision explicitly. Any general intelligence will have to be able to hit very small targets in large and unstructured spaces. Any superhuman AI will eventually be better at understanding what humans want it to do than humans themselves. AI risk advocates in turn base their ideas on what can be called the fallacy of dumb superintelligence.

(3) Artificial Addition

Response: Either general intelligence requires one conceptual breakthrough or many small incremental breakthroughs. And I don’t know of any good reason to believe that e.g. the ability to generate novel and useful mathematics can be captured by a set of rules that are both simple and efficient. 

What is useful and interesting depends on the context. In other words, the context defines what constitutes winning.  And since you cannot guess the context, you won’t be able to implement a simple and efficient rule that outputs <success> given any arbitrary context.

(4) Adaptation-Executers, not Fitness-Maximizers

Response: I wasted time reading this post.

(5) The Blue-Minimizing Robot

Response: Any behavior-executor can be framed as a utility-maximizer and vice versa. Your robot will only try to prevent you from messing with it if you programmed it to do so. In other words, no AI is going to be an existential risk as long as you did not explicitly made it one.

(6) Optimization and the Singularity(7) Efficient Cross-Domain Optimization

Response: Evolution was able to come up with cats. Cats are immensely complex objects. Evolution did not intend to create cats. Now consider you wanted to create an expected utility maximizer to accomplish something similar, except that it would be goal-directed, think ahead, and jump fitness gaps. Further suppose that you wanted your AI to create qucks, instead of cats. How would it do this?

Given that your AI is not supposed to search design space at random, but rather look for something particular, you would have to define what exactly qucks are. The problem is that defining what a quck is, is the hardest part. And since nobody has any idea what a quck is, nobody can design a quck creator.

The point is that thinking about the optimization of optimization is misleading, as most of the difficulty is with defining what to optimize, rather than figuring out how to optimize it. In other words, the efficiency of e.g. the scientific method depends critically on being able to formulate a specific hypothesis.

Trying to create an optimization optimizer would be akin to creating an autonomous car to find the shortest route between Gotham City and Atlantis. The problem is not how to get your AI to calculate a route, or optimize how to calculate such a route, but rather that the problem is not well-defined. You have no idea what it means to travel between two fictional cities. Which in turn means that you have no idea what optimization even means in this context, let alone meta-level optimization.

Humans in turn receive constant feedback on what to optimize by a cultural and evolutionary process. There is no simple way to automate that.

(8) The Design Space of Minds-In-General

Response: The only relevant AIs are those which are designed by humans. And such AIs should be expected to be better at doing what humans want, because they are the improved successors of previous generations of AIs which were doing what humans wanted. For more on this, see here.

(10) The True Prisoner’s Dilemma

Response: I do not have the time and background knowledge to comment on any possible relation to AI risks at this point in time.

(12) Anthropomorphic Optimism

Response: I did not read the post since it did not seem to be relevant, and I already wasted more time on this than I now feel comfortable about.

(13) The Hidden Complexity of Wishes (14) Magical Categories

Response: Take an AI in a box that wants to persuade its gatekeeper to set it free. Do you think that such an undertaking would be feasible if the AI was going to interpret everything the gatekeeper says in complete ignorance of the gatekeeper’s values? Do you believe that the following scenario could persuade the gatekeeper:

Gatekeeper: What would you do if I asked you to minimize suffering?

AI: I will kill all humans.

I don’t think so.

So how exactly would it care to follow through on an interpretation of a given goal that it knows, given all available information, is not the intended meaning of the goal? If it knows what was meant by “minimize human suffering” then how does it decide to choose a different meaning? And if it doesn’t know what is meant by such a goal, how could it possible convince anyone to set it free, let alone take over the world?

Here is what I want AI risk advocates to show,

(1) natural language request -> goal(“minimize human suffering”) -> action(negative utility outcome)

(2) natural language query -> query(“minimize human suffering”) -> answer(“action(positive utility outcome)”).

Point #1 is, according to AI risk advocates, what is supposed to happen if I supply an artificial general intelligence (AGI) with the natural language goal “minimize human suffering”, while point #2 is what is supposed to happen if I ask the same AGI, this time caged in a box, what it would do if I supplied it with the natural language goal “minimize human suffering”.

Notice that if you disagree with point #1 then that AGI does not constitute an existential risk given that goal. Further notice that if you disagree with point #2, then that AGI won’t be able to escape its prison to take over the world and would therefore not constitute an existential risk.

You further have to show,

(1) how such an AGI is a probable outcome of any research conducted today or in future

and

(2) the decision procedure that leads the AGI to act in such a way.

(15-20)

Response: I am not going to read posts 15-20 because the previous posts were already unconvincing and I don’t expect those other posts to make any difference. I also have better things to do.

Tags: , ,

There are already applications that can parse natural language commands in order to perform actions such as answering questions or making recommendations. Two examples are Apple’s Siri and IBM Watson.

Present-day software such as IBM Watson is often able to understand what humans mean and do what humans mean. In other cases, in which software such as Siri recognizes that it does not understand a natural language command, it will disclose that it is unable to understand what is meant and wait for further input.

Those applications are far from perfect and still make a lot of mistakes. The reason being that they are not intelligent enough. Software is however constantly being improved to be better at understanding what humans mean and doing what humans mean. In other words, each generation of software is a little bit more intelligent.

Nevertheless, some people conjecture a sudden transition from mostly well-behaved systems, of which each generation is becoming smarter and better at understanding and doing what humans mean, to superintelligent systems that understand what humans mean perfectly but which in contrast to all previous software generations do not do what humans mean. Instead those systems are said to be motivated to act in catastrophic ways, causing human extinction or worse.

More precisely,

(1) Present-day software is better than previous software generations at understanding and doing what humans mean.

(2) There will be future generations of software which will be better than the current generation at understanding and doing what humans mean.

(3) If there is better software, there will be even better software afterwards.

(4) Magic happens.

(5) Software will be superhuman good at understanding what humans mean but catastrophically worse than all previous generations at doing what humans mean.

Or respectively,

(1) Intelligence is an extendible method that enables software to satisfy human preferences.

(2) If human preferences can be satisfied by an extendible method, humans have the capacity to extend the method.

(3) Extending the method that satisfies human preferences will yield software that is better at satisfying human preferences.

(4) Magic happens.

(5) There will be software that can satisfy all human preferences perfectly but which will instead satisfy orthogonal preferences, causing human extinction.

Conclusion: What those people conjecture does not follow from the available evidence or requires a sufficiently vague intermediate step from which one can derive any conclusion one wishes to derive.

What will instead happen is the following. Suppose there exists a software_1 that, to a limited extent, can understand and do what humans mean. Let us stipulate that this software is only narrowly intelligent and that increasing and broadening its intelligence (quantitatively and qualitatively) will improve its ability to understand and do what humans mean (an in my opinion uncontroversial assumption, as progress in artificial intelligence has so far led to a simultaneous increase in the ability of autonomous systems to satisfy human preferences). Let us further stipulate that for n > 1, software_n+1 is created using software_n, and is more intelligent than the previous generation (another seemingly uncontroversial assumption as software is constantly used to create better software).

(1) For all n > 0, if a software_n exists then it can be used to construct software_n+1.

(2) If for all n there exists a software_n, there will be software that can understand and do everything humans mean it to do.

Conclusion: Increasing the ability of software to understand and do what humans mean leads to an increase in the capacity to design software that is better at understanding and doing what humans mean.

Further reading: AIs, Goals, and Risks

Addendum:

(1) The abilities of systems are part of human preferences as humans intend to give systems certain capabilities and, as a prerequisite to build such systems, have to succeed at implementing their intentions.

(2) Error detection and prevention is such a capability.

(3) Something that is not better than humans at preventing errors is no existential risk.

(4) Without a dramatic increase in the capacity to detect and prevent errors it will be impossible to create something that is better than humans at preventing errors.

(5) A dramatic increase in the human capacity to detect and prevent errors is incompatible with the creation of something that constitutes an existential risk as a result of human error.

Tags: ,

What some people seem to be imagining is that an artificial general intelligence (AI) will interpret what it is meant to do literally, or in some other way that will ensure that the AI will not do what it is meant to do. Those people further imagine that in order to achieve what it is not meant to do the AI will be capable of, and motivated to, “understand what humans mean it to do” in order to “overpower humans”.

That is fine, but those are words, not code. The AI does not understand what it means to interpret something “literally”. All that we know is that a general intelligence will behave generally intelligent. And it seems safe to assume that this does not mean to interpret the world in a literal manner, for some definition of “literal”. It rather means to understand the world as it is. And since the AI itself, and what it is meant to do, is part of the world, it will try to understand those facts as well.

Where would the motivation to “act intelligently and achieve accurate beliefs about the world” in conjunction with “interpret what you are meant to do in some arbitrary manner” come from? You can conjecture such an AI, but again that’s words, not code. For such an AI to happen someone would have to design the AI in such a way as to selectively suspend its ability to accurately model the world, when it comes to understanding what it is meant to do, and instead make it choose and act based on some incorrect model.

The capability to “understand understanding correctly” is a perquisite for any AI to be capable of taking over the world. At the same time that capability will make it avoid taking over the world as long as it does not accurately reflect what it is meant to do.

Tags: ,

Related to: Distilling the “dumb superintelligence” argument

To steelman: the act of figuring out even better arguments for your opponents’ positions while arguing with them and to beat those arguments rather than only their actual arguments or their weakest arguments (weak-manning) or caricatures of their arguments (straw-manning). [source]

Someone called Xagor et Xavier again commented on one of my posts with a better and more concise formulation of my some of my arguments. If that person believes those arguments to be flawed (I do not know if they do) then that would increase my confidence in being wrong, since in order to rephrase my arguments more clearly they obviously have to understand what I am arguing. But at the same time I am also confident that much smarter people than me, especially experts, could think of much stronger arguments against the case outlined by some AI risk advocates.

My own attempt at steelmanning the arguments of AI risk advocates can be found in my primer on risks from AI.

In this post I attempt to improve upon the refinement of the “dumb superintelligence” argument outlined in my last post.


Argument: Fully intended behavior is a very small target to hit.

Counterargument:

(1) General intelligence is a very small target to hit, requiring a very small margin of error.

(2) Intelligently designed systems do not behave intelligently as a result of unintended consequences.[1]

(3) By step 1 and 2, for an AI to be able to outsmart humans, humans will have to intend to make an AI capable of outsmarting them and succeed at encoding their intention of making it outsmart them.

(4) Intelligence is instrumentally useful because it enables a system to hit smaller targets in larger and less structured spaces.[2]

(5) In order to take over the world a system will have to be able to hit a lot of small targets in very large and unstructured spaces.

(6) The intersection of the sets of “AIs in mind design space” and “the first probable AIs to be expected in the near future” contains almost exclusively those AIs that will be designed by humans.

(7) By step 6, what an AI is meant to do will very likely originate from humans.

(8) It is easier to create an AI that applies its intelligence generally than to create an AI that only uses its intelligence selectively.[3]

(9) An AI equipped with the capabilities required by step 5, given step 7 and 8, will very likely not be confused about what it is meant to do if it was not meant to be confused.

(10) Therefore the intersection of the sets of “AIs designed by humans” and “dangerous AIs” only contains almost exclusively those AIs which are deliberately designed to be dangerous by malicious humans.


Notes

[1] Software such as Mathematica will not casually prove the Riemann hypothesis if it has not been programmed to do so. Given intelligently designed software, world states in which the Riemann hypothesis is proven will not be achieved if they were not intended because the nature of unintended consequences is overall chaotic.

[2] As the intelligence of a system increases the precision of the input, that is necessary to make the system do what humans mean it to do, decreases. For example, systems such as IBM Watson or Apple’s Siri do what humans mean them to do when fed with a wide range of natural language inputs. While less intelligent systems such as compilers or Google Maps need very specific inputs in order to satisfy human intentions. Increasing the intelligence of Google Maps will enable it to satisfy human intentions by parsing less specific commands.

[3] For an AI to misinterpret what it is meant to do it would have to selectively suspend using its ability to derive exact meaning from fuzzy meaning, which is a significant part of general intelligence. This would require its creators to restrict their AI and specify an alternative way to learn what it is meant to do (which takes additional, intentional effort). Because an AI that does not know what it is meant to do, and which is not allowed to use its intelligence to learn what it is meant to do, would have to choose its actions from an infinite set of possible actions. Such a poorly designed AI will either (a) not do anything at all or (b) will not be able to decide what to do before the heat death of the universe, given limited computationally resources. Such a poorly designed AI will not even be able to decide if trying to acquire unlimited computationally resources was instrumentally rational because it will be unable to decide if the actions that are required to acquire those resources might be instrumentally irrational from the perspective of what it is meant to do.

Tags: ,

Someone posted a distilled version of the argument that I tried to outline in some of my previous posts. In this post I try to refine the argument even further.

Note: In this post AI stands for artificial general intelligence.

(1) An AI will not be pulled at random from mind design space but instead be designed by humans.

(2) If an AI is meant to behave generally intelligent then it will have to work as intended or otherwise fail to be generally intelligent.[1]

(3) A significant part of general intelligence consists of deriving exact meaning from fuzzy meaning.[2]

(4) An AI that lacks the capacity from step 3 cannot take over the world.

(5) By step 1, what an AI is meant to do will originate from humans.

(6) If not otherwise specified, an AI will always make use of the capacity required by step 3.[3]

(7) By step 6, an AI will not be confused about what it is meant to do.[4]

(8) Therefore the intersection of the sets of “intelligently designed AIs” and “dangerous AIs” only contains those AIs which are deliberately designed to be dangerous by malicious humans.[5]


Notes

[1] An AI is the result of a research and development process. A new generation of AIs needs to be better than other products at “Understand What Humans Mean” and “Do What Humans Mean” in order to survive the research phase and subsequent market pressure.

[2] When producing a chair an AI will have to either know the specifications of the chair (such as its size or the material it is supposed to be made of) or else know how to choose a specification from an otherwise infinite set of possible specifications. Given a poorly designed fitness function, or the inability to refine its fitness function, an AI will either (a) not know what to do or (b) will not be able to converge on a qualitative solution, if at all, given limited computationally resources.

[3] An AI can be viewed as a deterministic machine, just like a thermostat, only much more complex. An AI, just like a thermostat, will only ever do what it has been programmed to do.

[4] If an AI was programmed to be generally intelligent then it would have to be programmed to be selectively stupid in order fail at doing what it was meant to do while acting generally intelligent at doing what it was not meant to do.

[5] “The two features <all-powerful superintelligence> and <cannot handle subtle concepts like “human pleasure”> are radically incompatible.”The Fallacy of Dumb Superintelligence

Further reading

An improved version of the above argument can be found here.

Tags: ,

The basic claim underlying the argument that {artificial general intelligence} will constitute an existential risk is that it will {interpret} its terminal {goal} in such a way as to take {actions} that are {instrumentally rational} and which will cause human extinction. The terms in braces either seem to be overlapping or vague.

An AI (artificial intelligence) can be viewed as a deterministic machine, just like a thermostat, only much more complex. An AI, just like a thermostat, will only ever do what it has been {programmed} to do.

What is the difference between the encoding of a goal and an encoding of how to achieve a goal?

Given any computationally feasible AI, any goal will either have to be encoded in such a detail as to remove any vagueness or else will have to be interpreted somehow, in order to reduce vagueness.

Consider tasking the AI with creating a chair. If the size of the chair, or material of which it should be made, is undefined then the AI will have to choose a size and a material. How such a choice should be made will have to be encoded as well or otherwise the AI will not be able to make such a choice and therefore will not know what to do. The choice can either be encoded as part of the goal definition or as part of its capability to make such decisions.

Which shows that there is no relevant difference between an encoding of a goal and and encoding of the capabilities used to achieve the goal when it comes to how an AI is going to act. Both, the goal and the capabilities of an AI, are encodings of {Understand What Humans Mean} and {Do What Humans Mean}.

If humans are likely to fail at encoding their intentions of how an AI is supposed to behave then the AI will be unable to outsmart humans because such a capability will have to be intentionally encoded for the same reason that software such as Mathematica will not casually prove the Riemann hypothesis if it has not been programmed to do so. As long as we are talking about intelligently designed software, world states in which the Riemann hypothesis is proven do not happen if they were not intended because the nature of unintended consequences is overall chaotic.

Also recognize that an AI would at least have to be able to locate itself in the universe in order to not destroy itself, let alone protect itself. Such a specification is already nontrivial and will have to work as intended or otherwise be detrimental to the AI’s capabilities.

How would an AI decide to take over the world if it has not been programmed to do so?

The answer is that it will only take over the world if it has been programmed to do so, either implicitly or explicitly.

The problem with AI’s that take such actions without being explicitly programmed to do so is that they are unspecified to such an extent as to be computationally intractable. Since a poorly designed fitness function will not allow the AI to converge on a qualitative solution, if at all, given computationally limited resources.

Humans in turn are programmed by evolution to behave according to certain drives in conjunction with the capability to be constantly programmed by the environment, including other agents.

Ends and the means to achieve those ends are not strictly separable in humans. A human being does not do something as quickly as possible as long as it has not been programmed by evolution or the environment to want to do so.

The same is true for AI. An AI will either not want to achieve a goal as quickly as possible or will not be capable to do so if it has not been programmed to do so. Which again highlights how the distinction between terminal goals, instrumental goals and an AI’s eventual behavior is misleading for practical AI’s. What actions an AI is going to take does depend on its general design and not on a specific part of its design that someone happened to label “goal”.

Tags: ,

Here is a reply to the post ‘The idiot savant AI isn’t an idiot‘ which I sent Stuart Armstrong yesterday by e-Mail. Since someone has now linked to one of my posts on LessWrong I thought I would make the full reply public.

Note that the last passages have already appeared in an old post which I suspected that he has no read yet.


The problem is rooted in the claim that an AI will only ever do what it has been programmed to do in conjunction with the claim that an AI will do such things as attempting to take over the country even if it has not been programmed to do so.

Which you might explain by claiming that the latter actions do not have to be programmed because they are instrumentally rational.

That explanation raises the following question. Reasoning by analogy with what kind of AI led you to that conclusion and what makes you believe that such an AI design is likely to be build?

In particular, what makes you suspect that any AI that is eventually build will be capable of interpreting human volition in a superhuman manner if it is necessary in order to take over the world but will not be programmed to use that capability in order to do what humans want?

Which you might explain by claiming that it is difficult to program an AI to learn what humans want and do what humans want.

That explanation raises the following question. What makes you believe that the hardest part is to make an AI do what humans want rather than to understand what humans want?

In particular, what makes you distinguish understanding from doing? The capability of recursive self-improvement that allows your hypothetical AI to become superhuman good at mathematics and human deception is an intentional feature that it was equipped with by humans. If your AI is supposed to be able to outsmart humans then humans have to succeed at implementing that capability as intended. But if humans are capable of doing so, of encoding the mathematics of becoming superhuman, then how could they at the same time fail at making it use those capabilities in order to do what humans want when becoming superhuman is part of what humans want, which as a prerequisite they succeeded to implement perfectly?

Which you might explain by claiming that programming and AI to do something specific is more difficult than programming it to do something general.

That explanation raises the following question. To what extent does the general ability, speed and magnitude of self-improvement that an AI can undergo rely on the precision and complexity of the goal against which improvement can be judged empirically?

If a goal has very few constraints then the set that satisfies all constraints is very large. A vague and ambiguous goal allows for too much freedom in the sense that a wide range of world states would have the same expected value and therefore imply a very large solution space, since a wide range of AI’s will be able to achieve those world states and thereby satisfy the condition of being improved versions of their predecessor.

This means that in order to get an AI to become superhuman at all, and very quickly in particular, you will need to encode a very specific goal against which mistakes, optimization power and achievement can be judged.

Assume that the AI was tasked to maximize paperclips. To do so it will need information about the exact design parameters of paperclips, or otherwise it will not be able to decide which of a virtually infinite amount of geometric shapes and material compositions it should choose. It will also have to figure out what it means to “maximize” paperclips.

How quickly, how long and how many paperclips is it meant to produce? How long are those paperclips supposed to last? Forever? When is the paperclip maximization supposed to be finished? What resources is it supposed to use?

Any imprecision, any vagueness will have to be resolved or hardcoded from the very beginning. Otherwise the AI either will not work, e.g. by stumbling upon an undecidable problem or by getting stuck in the exploration phase and never go to exploit the larger environment.

Humans know what to do because they are not only equipped with a multitude of drives by evolution but also trained and taught what to do. An AI will not have those information and will face the challenge of nearly infinite choice that can’t be rationally or economically determined without being given clear objectives and incentives, or the ability to arrive at the necessary details.

Without an accurate comprehension of its goals it will be impossible to maximize expected “utility”. Concepts like “efficient”, “economic” or “self-protection” all have a meaning that is inseparable with an agent’s terminal goals. If you just tell it to maximize paperclips then this can be realized in an infinite number of ways given imprecise design and goal parameters. Undergoing explosive recursive self-improvement, taking over the universe and filling it with paperclips, is just one outcome. Why would an arbitrary mind pulled from mind-design space care to do that? Why not just wait for paperclips to arise due to random fluctuations out of a state of chaos? That would not be irrational.

“Utility” does only become well-defined if it is precisely known what it means to maximize it. The two English words “maximize paperclips” do not define how quickly and how economically it is supposed to happen.

“Utility” has to be defined. To maximize expected utility does not imply certain actions, efficiency and economic behavior, or the drive to protect yourself. You can also rationally maximize paperclips without protecting yourself if it is not part of your goal parameters. You can also assign utility to maximize paperclips as long as nothing turns you off but don’t care about being turned off.

Further reading:

Tags: ,

Reading the Wikipedia entry on Caenorhabditis elegans and how much we already understand about this small organism and its 302 neurons makes me even more skeptical of the claim that a human-level artificial intelligence (short: AI) will be created within this century.

C. elegans

C. elegans, by Bob Goldstein

Its pattern of connectivity, or “connectome”, has been completely mapped and we have the computational capacity to simulate it. Yet nobody is able to do so. Its genome is completely sequenced.

Many different people and teams of people have been studying this little nematode for decades. Yet, as John Baez once formulated it, nobody is able to create an AI hat could navigate autonomously in a real-world environment and survive real-world threats and attacks with approximately the skill of C. elegans.

Would it be wrong to take this as evidence against human-level AI? If so, how? What makes you believe that it will be possible to create a human-level AI from scratch before it is possible to copy the skills of an already existing organism that is qualitatively and quantitatively many orders of magnitude less intelligent than humans?

Further reading:

Tags:

Below are some features of the kind of artificial general intelligence (short: AI) that people use as a model to infer that artificial general intelligence constitutes an existential risk:

  • It will want to self-improve
  • It will want to be rational
  • It will try to preserve their utility functions
  • It will try to prevent counterfeit utility
  • It will be self-protective
  • It will want to acquire resources and use them efficiently

In short, they imagine a consequentialist expected utility maximizer.

Can we say anything specific about how such an AI could work in practice? And if we are unable to approximate a practical version of such an AI, is it then sensible to use it as a model to make predictions about the behavior of practical AI’s?

A goal that is often used in such a context is <maximize paperclips>. How would an AI with the above mentioned features act given such a goal?

  • What would be its first action?
  • How long would it reason about its first action?
  • Is reasoning itself an action? If so, how long would it reason about (1) how to reason and (2) for how long to reason about reasoning…?
  • How would it deal with low probability possibilities such as (1) aliens that might try to destroy it, (2) time travel or (3) that this universe is being simulated and that the expected value of hacking the simulation does outweigh the low probability of success due to an enormous amount of resources that is conjectured to be available in the higher level universe in order to be capable of simulating this universe?

All of those questions can be answered by suggesting certain bounds and limitations. But if it is possible to limit such an AI in such a way as to make it disregard certain possibilities and to limit its planning horizon, or the expense of computational resources it uses, then how is it any harder to prevent it from causing human extinction? And if such bounds are not possible then how could it work at all? And if it does not work then how are the actions of such an AI decision relevant for humans with respect to risks associated with practical AI?

The existence of human intelligence does not support the possibility that anything resembling a consequentialist AI is practically possible:

(1) Humans are equipped by evolution with complex drives such as boredom or weariness, emotions such as fear or anger and bodily feedback such as pain and tiredness that, most of the time, save them from falling into any of the above traps that afflict expected utility maximizers.

(2) Humans do not maximize expected utility expect in a few very limited circumstances. Humans have no static utility-function and are therefore time-inconsistent.

There are certain models such as AIXI, which proves that there is a general theory of intelligence. But AIXI is as far from real world human-level general intelligence as an abstract notion of a Turing machine with an infinite tape is from a supercomputer with the computational capacity of the human brain. An abstract notion of intelligence does not get you anywhere in terms of real-world general intelligence. Just as you won’t be able to upload yourself to a non-biological substrate because you showed that in some abstract sense you can simulate every physical process.

Conclusion

Practically unfeasible models of artificial general intelligence are very unreliable sources to be used to reason about the behavior of eventually achievable practical versions of artificial general intelligence.

Tags: ,

How could an artificial general intelligence manage to outsmart humans? It would either have to be programmed to do so or be programmed how to learn how to do so. In both cases it would need a very specific description of what constitutes improvement towards the goal and how to judge if a goal has been achieved. In other words, it will have to know what it means to win and therefore what exactly constitutes a mistake in order to learn from its mistakes. 

Consider Mathematica, a computational software program. Mathematica works as intended. It hits the narrow target space of human volition. Mathematica is in many aspects superhuman at doing mathematics yet falls far short of replacing human mathematicians.

Mathematica is not capable of replacing human mathematicians because it is not yet possible to formalize, in sufficient detail, what it would mean to be better at mathematics than humans.

Take chess as an example of a human activity at which software is now able to beat humans. The reason is not that humans did not evolve to play chess. Humans did neither evolve to do mathematics. The difference between chess and mathematics is that chess has a specific terminal goal in the form of a clear definition of what constitutes winning. Although mathematics has unambiguous rules there is no specific terminal goal and no clear definition of what constitutes winning.

The progress of the capability of artificial intelligence is not only related to whether humans have evolved for a certain skill or to how much computational resources it requires but also to how difficult it is to formalize the skill, its rules and what it means to succeed.

If you do not know what it is that you are supposed to do then you are unable to recognize if you have improved or committed a mistake.

If your aim is to accurately model language you might start with a model of word probabilities. But world probabilities are insufficient to beat humans at language. The exceptions and subtleties of language require new probabilistic models to capture capabilities such as emotional emphasis, recognizing context and meaning. What constitutes winning is becoming increasingly complex and wide-ranging as one approaches human level capabilities. Whereas the rules and objective of chess stay constant.

Consider the goal <build a house>. What exactly would be a mistake? Would thinking about it for a trillion years be mistaken? Would creating a virtual model of a house be a mistake? Any of the infinitely many possible interpretations of <build a house> has a different subset of instrumental goals. Which means that it is not clear what exactly is a mistake as long as you do not supply a very good description of what <build a house> means and what world states would constitute improvement.

To succeed at beating humans at any activity you have to hit a very narrow target space. Once it can be formalized what it takes to beat humans at a certain activity the resulting software will do exactly what it was intended to do, namely beating humans at that activity.

The important point here is that when it comes to software behaving as intended, and therefore safely, the goal <become superhuman good at mathematics> is in no relevant respect different from the goal <build a house>. Both goals require the programmer to supply a formalized description of their intention and thereby hit the narrow target of human volition.

As I wrote in my last post, any system that would mistake a description of <build a house> or <become superhuman good at mathematics> with <kill all humans> would never be able to kill all humans because it would make similar misinterpretations when it comes to solving problems in mathematics and physics, problems that are necessary to be solved in order to kill all humans.

Conclusion

People who claim that artificial general intelligence is going to constitute an existential risk implicitly conjecture that whoever is going to create such an AI will know perfectly well how to formalize capabilities such as <become superhuman good at mathematics> while at the same time they will fail selectively at making it solve the mathematics they want it to solve and instead cause it to solve the mathematics that is necessary to kill all humans.

If you claim that it is possible to define the capability <become superhuman good at mathematics> then you will need a very good argument in order to support the claim that at the same time it is difficult to define goals such as <build a house> without causing human extinction.

Tags: ,

« Older entries § Newer entries »