friendly AI

You are currently browsing articles tagged friendly AI.

How do you guarantee that an artificial intelligence (short: AI) has a positive impact? Here, a positive impact might, for example, be defined as some sort of reflective equilibrium of humanity.

Let us label <friendly> any agent, be it human or artificial, that has a positive impact.

The most important safety measures seem to be the following:

(1) Ensuring that an AI works as intended.

(2) Ensuring that humans, who either create or use AI, are friendly.

(3) Ensuring that an AI is friendly.

Point 1 and 2 are important, but not strictly necessary for point 3. Ideally, point 3 should be achieved by independent oversight (point 2), in combination with an independent verification of the behavior of the AI (point 1).

Note how point 1 is distinct from point 3. You could have an AI that is not friendly, which does not actively pursue a positive impact, but whose overall impact is proven to be limited. As would be the case given a mathematical proof that such an unfriendly AI would, for example, (1) only run for N seconds (2) only use predefined computational resources (3) only communicate with the outside world by outputting mathematical proofs of the behavior of improved versions of itself, which are to be verifiable by humans.

Remarks: It should be much easier to prove an AI to be bounded than to prove that an AI will pursue a complex goal without unintended consequences. Such a confined AI could then be studied and used as a tool, in order to ensure point 3.

The first version of such an unfriendly AI (uFAI_01) would be provably confined to only run for a limited amount of time, using a limited amount of resources, and only output mathematical proofs of its own behavior. Once a sufficient level of confidence about its behavior has been reached, an improved version (uFAI_02) could then be designed. The domain of uFAI_02 would provably be modified versions of its source code (uFAI_N). Its range would provably be human-verifiable mathematical proofs of the behavior of uFAIN_N, which it would provably output using a limited amount of resources. This process would then be iterated up to an arbitrary level of confidence, until eventually a friendly AI is obtained.

Tags: ,

For the sake of the argument, suppose that AI risk advocates succeed at implementing an artificial general intelligence that protects and amplifies human values (friendly AI).

Such a friendly AI (FAI) would have to (1) disallow any entity smarter than itself that isn’t provably friendly (2) know exactly what humans value and how to protect and amplify those values in a way that humans desire.

How valuable would such an outcome be? Let’s look at a specific human value and its expected value in the context of a universe ruled by such an FAI. Let’s look at doing philosophy.

I can see two possibilities,

(1) The FAI had to solve all of philosophy in order do its job.

(2) The FAI did not have to solve philosophy but would in principle be capable of doing so.

Given either possibility, how much would humans value to do philosophy if all interesting questions either had already been answered or could easily be answered by the FAI?

That partly depends on whether it would be possible to just ask the FAI for any answer. But why would that not be possible? There seem to be two answers,

(1) The FAI learnt that humans don’t want it to answer such questions.

(2) The FAI was programmed to not answer such questions.

The first possibility seems to imply that humans want to figure out philosophy in a certain way, which does not include just asking for an answer or looking it up. But how likely is this possibility? How many philosophers would desire that the Stanford Encyclopedia of Philosophy would not exist so that they could figure out all of it on their own?

The second possibility is itself problematic. In a universe ruled by an FAI, artificial general intelligence and friendly AI have obviously been solved. Which means that people could either desire the FAI to alter itself in such a way that it would be able to answer such questions, or implement a less capable version that can answer philosophy questions. And if that isn’t allowed, which would mean that pretty much the whole field of machine learning would be forbidden, then people could just ask the FAI to improve themselves in such a way as to be capable of easily solving any philosophical puzzle.

To recapitulate the situation. Given any human intellectual activity, not just philosophy, in a universe controlled by an FAI it should be possibly to either,

(1) Directly ask the FAI for an answer to any question.

(2) Implement a superintelligence that could answer those questions.

(3) Ask to have your cognitive abilities improved in such a way as to easily answer those questions.

No matter if the above possibilities are allowed or not, in both cases a wide range of human values would be dramatically reduced. Because either all human intellectual activity becomes as trivial as asking a question, or humans are forever stuck with the mental capabilities that they have been equipped with by evolution, while being forbidden to create another intelligence more capable than themselves.

The only way out that I can imagine is to choose ignorance. To ask the FAI to be oblivious of its existence and of how to create an FAI. But who would desire that? Who would desire to forever fail at solving philosophy, amplifying human intelligence, or to create an artificial one? I would certainly hate not to know the truth, to be forever fooled.

Tags: ,

Foragers versus industry era folks

Consider the difference between a hunter-gatherer, who cares about his hunting success and to become the new tribal chief, and a modern computer scientist who wants to determine if a “sufficiently large randomized Conway board could turn out to converge to a barren ‘all off’ state.”

The value of the success in hunting down animals and proving abstract conjectures about cellular automata is largely determined by factors such as your education, culture and environmental circumstances. The same forager who cared to kill a lot of animals, to get the best ladies in its clan, might have under different circumstances turned out to be a vegetarian mathematician solely caring about his understanding of the nature of reality. Both sets of values are to some extent mutually exclusive. Yet both sets of values are what the person wants, given the circumstances. Change the circumstances dramatically and you change the persons values.

What do you really want?

You might conclude that what the hunter-gatherer really wants is to solve abstract mathematical problems, he just doesn’t know it. But there is no set of values that a person “really” wants. Humans are largely defined by the circumstances they reside in. If you already knew a movie, you wouldn’t watch it. To be able to get your meat from the supermarket changes the value of hunting.

If “we knew more, thought faster, were more the people we wished we were, and had grown up closer together” then we would stop to desire what we learnt, wish to think even faster, become even different people and get bored of and rise up from the people similar to us.

A singleton is an attractor

A singleton will inevitably change everything by causing a feedback loop between itself as an attractor and humans and their values.

Much of our values and goals, what we want, are culturally induced or the result of our ignorance. Reduce our ignorance and you change our values. One trivial example is our intellectual curiosity. If we don’t need to figure out what we want on our own, our curiosity is impaired.

A singleton won’t extrapolate human volition but implement an artificial set of values as a result of abstract high-order contemplations about rational conduct.

With knowledge comes responsibility, with wisdom comes sorrow

Knowledge changes and introduces terminal goals. The toolkit that is called ‘rationality’, the rules and heuristics developed to help us to achieve our terminal goals, is also altering and deleting them. A stone age hunter-gatherer seems to possess very different values than we do. Learning about rationality and various ethical theories such as Utilitarianism would alter those values considerably.

Rationality was meant to help us achieve our goals, e.g. become a better hunter. Rationality was designed to tell us what we ought to do (instrumental goals) in order to achieve what we want to do (terminal goals). Yet what actually happens is that we are told, that we will learn, what we ought to want.

If an agent becomes more knowledgeable and smarter then this does not leave its goal-reward-system intact if it is not especially designed to be stable. An agent who originally wanted to become a better hunter and feed his tribe would end up wanting to eliminate poverty in Obscureistan. The question is, how much of this new “wanting” is the result of using rationality to achieve terminal goals and how much is a side-effect of using rationality, how much is left of the original values versus the values induced by a feedback loop between the toolkit and its user?

Take for example an agent that is facing the Prisoner’s dilemma. Such an agent might originally tend to cooperate and only after learning about game theory decide to defect and gain a greater payoff. Was it rational for the agent to learn about game theory, in the sense that it helped the agent to achieve its goal or in the sense that it deleted one of its goals in exchange for a allegedly more “valuable” goal?

Beware rationality as a purpose in and of itself

It seems to me that becoming more knowledgeable and smarter is gradually altering our utility functions. But what is it that we are approaching if the extrapolation of our volition becomes a purpose in and of itself? Extrapolating our coherent volition will distort or alter what we really value by installing a new cognitive toolkit designed to achieve an equilibrium between us and other agents with the same toolkit.

Would a singleton be a tool that we can use to get what we want or would the tool use us to do what it does, would we be modeled or would it create models, would we be extrapolating our volition or rather follow our extrapolations?

Tags: , , , , ,