Articles by Alexander Kruel

You are currently browsing Alexander Kruel’s articles.

Here is part of an interesting comment by Deen Abiola:

…in defining an AGI we are actually looking for a general optimization/compression/learning algorithm which when fed itself as an input, outputs a new algorithm that is better by some multiple. Surely this is at least an NP-Complete if not more problem. It may improve for a little bit and then hit a wall where the search space becomes intractable. It may use heuristics and approximations and what not but each improvement will be very hard won and expensive in terms of energy and matter. But no matter how much it tried, the cold hard reality is that you cannot compute an EXPonential Time algorithm in polynomial time unless (P=EXPTIME :S). A no self-recursive exponential intelligence theorem would fit in with all the other limitations (speed, information density, Turing, Gödel, uncertainties etc) the universe imposes.

The complete comment and a somewhat unrelated discussion can be found in this thread.

Further reading

Interview series on risks from AI


Tags: ,

(An improved version of this post can be found here.)

If you were going to add huge amounts of intelligence to Google maps, why would it turn worse?

Sure, the space of unfriendly navigation software is much larger than the space of navigation software oriented toward navigating to good destinations – i.e., destinations consistent with human intent.

But what reason do we have to believe that improving our navigation software to the point of being general intelligent will make it kill us?

Right now, if I ask Google maps to navigate me toward McDonald’s, it does the job very well. So why would an ultraintelligent Google Maps misunderstand what I mean by “Take me to McDonald’s” and navigate me toward a McDonald’s in France, plunging me into the sea? Or drive me underground where the corpse of a man named McDonald lies?

I think that the idea that an ultraintelligent Google maps would decide to kill all humans, because they are a security risk, is like saying that it would destroy all roads because it would be less computationally expensive to calculate the routes then. After all, roads were never an explicit part of its goal architecture, so why not destroy them all? It completely misses the point of the implicit constraints of any real-world goals.

You can come up with all kinds of complex fantasies where a certain kind of artificial general intelligence is invented overnight and suddenly makes a huge jump in capability, taking over the universe and destroying all human value.

That is however completely unconvincing given that technology is constantly improved toward more user-friendliness and better results and that malfunctions are seldom of such a vast complexity as to work well enough to outsmart humanity.


Tags: , ,

(An improved version of this post can be found here.)

A cherished idea of AI risk proponents is that an expected utility maximizer will completely ignore anything which it is not specifically tasked to maximize.

One example here is that if you tell a superintelligent expected utility maximizer to prevent human suffering it might simply kill all humans, notwithstanding that it is obviously not what humans want an AI to do and what humans mean by “prevent human suffering”.

In the sense that the computation of an algorithm is deterministic, that line of reasoning is not illogical.

Let us instead of a superhuman agent conjecture the possibility of an oracle, an ultra-advanced version of Google or IBM Watson.

If I was to ask such an answering machine how to prevent human suffering, would it be reasonable to assume that the top result it would return would be to kill all humans? Would any product that returns similarly wrong answers survive even the earliest research phase, let alone any market pressure?

Don’t get me wrong though. A thermostat is not going to do anything else than what it has been designed for. But an AI is very likely going to be designed to exhibit some amount of user-friendliness. Although that doesn’t mean that one can’t design an AI that won’t, the default outcome seems to be that an AI is not just going to act according to its utility-function but also according to more basic drives, i.e. acting intelligently.

A fundamental requirement for any rational agent is the motivation to act maximally intelligently and correctly. That requirement seems even more obvious if we are talking about a conjectured artificial general intelligence (AGI) that is able to improve itself to the point where it is substantially better at most activities than humans. Since if it wouldn’t want to be maximally correct then it wouldn’t become superhuman intelligent in the first place.

If we consider giving such an AGI a simple goal, e.g. the goal of paperclip maximization. Is it really clear that human values are not implicit even given such a simplistic goal?

To pose an existential risk in the first place, an AGI would have to maximize paperclips in an unbounded way, eventually taking over the whole universe and convert all matter into paperclips. Given that no sane human would explicitly define such a goal, an AGI with the goal of maximizing paperclips would have to infer it as implicit to do so. But would such an inference make sense, given its superhuman intelligence?

The question boils down to how an AGI would interpret any vagueness present in its goal architecture and how it would deal with the implied invisible.

Given that any rational agent, especially AGI’s capable of recursive self-improvement, want to act in the most intelligent and correct way possible, it seems reasonable that it would interpret any vagueness in a way that most closely reflects the most probable way it was meant to be interpreted.

Would it be intelligent and correct to ignore human volition in the context of maximizing paperclips? Would it be less wrong to maximize paperclips in the most literal sense possible?

The argument uttered by advocates of friendly AI is that any AGI that isn’t explicitly designed to be friendly won’t be friendly. But how much sense does this actually make?

Any human who does pursue a business realizes that a contract with its customers includes unspoken, implicit parameters. Respecting those implied values of their customers is not a result of their shared evolutionary history but a result of their intelligence that allows them to realize that the goal of their business implicitly includes those values.

Every human craftsman who enters into an agreement is bound by a contract that includes a lot of implied conditions. Humans use their intelligence to fill the gaps. For example, if a human craftsman is told to decorate a house, they are not going to attempt to take over the neighbourhood to protect their work.

A human craftsman wouldn’t do that, not because they share human values, but simply because it wouldn’t be sensible to do so given the implicit frame of reference of their contract. The contract implicitly includes the volition of the person that told them to decorate their house. They might not even like the way they are supposed to do it. It would simply be stupid to do it any different way.

How would a superhuman AI not contemplate its own drives and interpret them given the right frame of reference, i.e. human volition? Why would a superhuman general intelligence misunderstand what is meant by “maximize paperclips”, while any human intelligence will be better able to infer the correct interpretation?

It would in principle be possible to create a superintelligent machine that does kill all humans, but it would have to be explicitly designed to do so. Since as long as there is some vagueness involved, as long as its goal parameters are open to interpretation, a superintelligence will by definition arrive at the correct implications or otherwise it wouldn’t be superintelligent in the first place. And given most goals it is implicit that it would be incorrect to assume that human volition is not a relevant factor in the correct interpretation of how to act.

Further reading

Risks from AI and Charitable Giving


Tags: , , ,

The following is based on the book ‘Das Ziegenproblem‘ by Gero von Randow.

Setup:

  • There are 3 doors.
  • 1 door has a car behind it.
  • 2 doors have a goat behind it.
  • The content behind the doors is randomly chosen.
  • There are 2 candidates, A and B.
  • There is 1 moderator who knows which door has a car behind it.

Actions:

  1. A and B are asked to choose a door and both choose the same door.
  2. The moderator chooses one door which has a goat behind it.
  3. A and B are asked if they would like to switch their choice and pick the remaining door.
  4. A always stays with his choice, the door that has been initially chosen by both A and B.
  5. B always changes her choice to the remaining third door.

Repeat the actions 999 times:

If you repeat the above list of actions 999 times, given the same setup, what will happen?

Candidate A always stays with his initial choice. Which means that he will on average win 1/3 of all games. He will win 1/3*999, 333 cars.

But who won the remaining 666 cars?

Given the setup of the game, the moderator has to choose a door with a goat behind it. Therefore the moderator does win 0 cars.

Candidate B, who always switched her choice, after the moderator picked a door with a goat behind it, must have won the remaining 666 cars (2/3*999)!

1 candidate and 100 doors:

Alter the above setup of the game in the following way

  • There are 100 doors.
  • 1 door has a car behind it.
  • 99 doors have a goat behind it.
  • There is 1 candidate, A.

Alter the above actions in the following way

  • The moderator opens 98 doors with goats behind them.

Now let’s say the candidate picks door number 8. By rule of the game the moderator now has to open 98 of the remaining 99 doors behind which there is no car.

Afterwards there is only one door left besides door 8 that the candidate has chosen.

You would probably switch your choice to the remaining door now. If so, the same should be the case with only 3 doors!

Further explanation:

Your chance of picking the car with your initial choice is 1/3 but your chance of choosing a door with a goat behind it, at the beginning, is 2/3. Thus on average, 2/3 of times that you are playing this game you’ll pick a goat at first go. That also means that 2/3 of times that you are playing this game, and by definition pick a goat, the moderator will have to pick the only remaining goat. Because given the laws of the game the moderator knows where the car is and is only allowed to open a door with a goat in it.

What does that mean?

On average, at first go, you pick a goat 2/3 of the time and hence the moderator is forced to pick the remaining goat 2/3 of the time. That means 2/3 of the time there is no goat left, only the car is left behind the remaining door. Therefore 2/3 of the time the remaining door has the car. Which makes switching the winning strategy.

Further reading


Tags:

There are n = 4 sorts of candy to choose from and you want to buy k = 10 candies. How many ways can you do it?

This is a problem of counting combinations (order does not matter) with repetition (you can choose multiple items from each category). Below we will translate this problem into a problem of counting combinations without repetition, which can be solved by using a better understood formula that is known as the “binomial coefficient“.

First let us represent the 10 possible candies by 10 symbols ‘C’ and divide them into 4 categories by placing a partition wall, represented by a ‘+’ sign, between each sort of candy to separate them from each other

CC+CCCC+C+CCC

Note that there are 10 symbols ‘C’ and 3 partition walls, represented by a ‘+’ sign. That is, there are n-1+k = 13, equivalently n+k-1, symbols. Further note that each of the 3 partition walls could be in 1 of 13 positions. In other words, to represent various choices of 10 candies from 4 categories, the positions of the partition walls could be rearranged by choosing n-1 = 3 of n+k-1 = 13 positions

C++CCC+CCCCCC

CCCCCCCCCC+++

We have now translated the original problem into choosing 3 of 13 available positions.

Note that each position can only be chosen once. Further, the order of the positions does not matter. Since choosing positions {1, 4, 12} does separate the same choice of candies as the set of positions {4, 12, 1}. Which means that we are now dealing with combinations without repetition.

Calculating combinations without repetition can be done using the formula that is known as the binomial coefficient

n!/k!(n-k)!

As justified above, to calculate combinations with repetition, simply replace n with n+k-1 and k with n-1,

(n+k-1)!/(n-1)!((n+k-1)-(n-1))!

In our example above this would be (4+10-1)!/(4-1)!((4+10-1)-(4-1))! = 13!/3!10!. Which is equivalent to

(n+k-1)!/k!(n-1)!

because (4+10-1)!/10!(4-1)! = 13!/10!3! = 13!/3!10!, which is the same result that we got above.

Further reading


Tags:

If you wonder, Robert was some dude playing the same game: Motocross Madness 2


Tags: , , ,

(Note: The following is a quick and dirty polishing of an outdated post that I wrote years ago.)

What free will isn’t

A man can do what he wants, but not want what he wants.

— Arthur Schopenhauer

Free will does not and cannot be defined as the ability to make decisions without cause, random or unpredictable decisions. All those qualities, although partly present in complex systems, would contradict the notion of willful intent. What we want, and therefore do, must be based on reasonable ground. Random convulsions do not satisfy our notion of volition. It is defined as purposive striving and thus has to have a reason, it has to be a result of causal relationships.

What really matters to you is not that nobody is able to predict what you are going to do, but that you are capable to do so. What matters is that nothing prevents the realization of your goals. What matters is that you are free to do what you want.

Wat you want to do in the first place is not a matter of choice. You don’t even care about that. You solely care about being able to satisfy your needs and preferences. And I think that is the only reasonable, necessary and desired definition of free will that exists. To be free to realize what you want.

And that is also where predictability becomes an important aspect. If you are able to predict that you are unable to realize a certain goal, then you feel constrained. But most of the time we do not know if we will be able to realize our goals. We are unable to predict our success. That’s why we have to try. Uncertainty allows us to feel capable and therefore free.

Ask yourself, what is it that you want to be “free from”? You just want to be “free to”. To be free to do what you want. You do not want to be free from the constraints of mathematics, physics and rationality. You want to be constrained by reason, sanity and rules. You don’t want to be free in any sense that contradicts determinism. What you want is possibility, potentiality, enough room, enough resources to possible realize your goals.

A futile definition

There’s no scientific reason to believe that we have free will. There’s no buffer zone that we’ve found in any of the physical laws of how the universe works to make room for free will. There’s non-determinism; but there’s not choice. Choice is the introduction of something, dare I say it, supernatural: some influence that isn’t part of the physical interaction, which allows some clusters of matter and energy to decide how they’ll collapse a probabilistic waveform into a particular reality.

Mark Chu-Carroll

Looking for free will as seen from a strong philosophical viewpoint is a futile effort. It’s asking for rainbows end. Reality, reason and logic forbid the notion of libertarian free will. Because libertarianism implies freedom of choice, which in turn implies absolute control, which is impossible.

Here is the problem. Internal causes are ultimately indistinguishable from causal relationships between a defined agent and the environment it resides in. But sufficient control over internal causes is prohibited.

To have a choice, an agent would have to understand its own workings and motives completely. Yet no system can understand itself for that the very understanding would evade itself forever, like a bin trying to contain itself.

A redefinition

Determinism is true but thermostats can still control the temperature. And nobody denies that thermostats control the temperature.

— Steven Landsburg paraphrasing Robert Nozick in The Big Questions

I would like to define the concept of free will as an agent’s ability to transform the world. Free will is the influence an agent does exert on the world versus the influence that the world has on the agent. More precisely, an agent can make free decisions if its internal stability can withstand external influences to a greater extent than the external influences can withstand its influence.

The degree to which an agent qualifies as free is dependent on the extent to which it satisfies the following criteria:

  • Its goals and internal decision procedures are stable under environmental influences.
  • It does exert goal-oriented, specific and orderly influence on the environment.
  • The complexity of transformation by which it shapes the outside environment (in which it is embedded), does outplay the environmental influence on itself.

Free will is a middleman.
Consciousness between cause and effect.
The intelligent refinement of causation into an effective agent.
The sun at your back – your shadow in front.
You are the shadow player.
Nevertheless, to claim sovereignty is trying to get ahead of your own shadow.
You imprint reality with a pattern of volition. But not without its implicit consent.

Is it real?

How does all this relate to our actual experience of free will and our use of the concept?

You have free will if you experience, or possess, a greater extent of freedom proportional to the amount of influence and effectiveness of control you exert over the environment versus the environment over you.

Here is an example. Children and some mentally handicapped people are not responsible in the same way that healthy adults are responsible for their actions. They cannot give consent or enter into legally binding contracts. One of the reasons for this is that they lack control, are easily influenced by others. Healthy humans exert a higher control than children and handicapped people.

Is it useful?

How much sense does all this make? I don’t know. I do not have the expertise to base my ideas on firm ground or even judge the credibility of my thoughts. Nonetheless, so far the above is as close as I can get towards a satisfying framework for the notion of free will.

I must also admit that my definition of free will does only work once you arbitrarily define a system to be an entity within an environment, contrary to being the environment.

The universe really just exists. And it appears to us that it is unfolding because we are part of it. We appear to each other to be free and intelligent because we believe that we are not part of it.

Nevertheless, I think it might after all be a useful definition when it comes to science, psychology and law. It might also very well address our understanding of being free agents.

Don’t get me wrong though, I believe that, from a purely practical point of view, we can do without the notion of free will just fine. People still have to go to jail to protect society, to educate them and because a general policy of deterrence is useful. Responsibility is not necessary.


Tags:

Followup to: Acknowledge and allow for your needs

In his post ‘The End of Rationality‘ muflax wrote:

I’m basically done with rationality.

Ok, seriously now. I’ve always enjoyed XiXiDu‘s criticisms on LW, but for over a year now, whenever I read his stuff I wonder why he keeps on making it. I mean, he has been saying (more-or-less correctly so, I think) that SIAI and the LW sequences score high on any crackpot test, that virtually no expert in the field takes any of it seriously, that rationality (in the LW sense) has not shown any tangible results, that there are problems so huge you can fly a whole deconstructor fleet through, that the Outside View utterly disagrees with both the premises and conclusions of most LW thought, that actually taking it seriously should drive people insane [...]

The keyword here is approximation. Just because general relativity and quantum mechanics break down in describing singularities it doesn’t mean that we’re “done with” those theories.

If some type of otherwise rational behavior leads to absurd, undesirable or unbearable consequences, then, in the absence of a better heuristic, you approximate the behavior as far as possible.

All you have to realize is that a reflective equilibrium is possible. A state where you balance all kinds of evidence with your preferences, elementary needs, computational and general resource limitations.

There are basically four weighted levels:

  • Level 1: Contemplation/Rationality (conscious, reflective high-level cognition (trying to do what is objectively right).
  • Level 2: Instinct, intuition and gut feeling (tapping your unconscious evolutionary resources).
  • Level 3: Satisfaction of elementary needs (doing what you have to do because you need to do it (this includes having fun); paying attention to your limitations;).
  • Level 4: Doing what you want based on naive introspection.

Level 1 should as far as I know have the most weight. But the weighting can change based on the circumstances. For example, if Level 2 is sufficiently strong it can cause you to discount some Level 1 considerations.


Tags:

Consider an agent A which assumes itself to make only correct decisions. Here an arbitrary decision is denoted d and correct is denoted C, where Cd is defined to be any decision (respectively set of decisions) maximizing expected utility according to an agent’s utility-function U. Therefore A assumes CA, where CA is the set of all decisions that A is capable of deciding that also belong to the set of all correct decisions Cd (∀d ∈ CA, d ∈ Cd).

Let one possible decision k be defined as ¬Cd (k := ¬Cd (¬Cd is true if decision d does not maximize the expected utility of agent A)).

If A ever decides k then this will falsify its assumption that it only makes correct decisions (CA) and hence prove itself to be incorrect (¬CA). But since A assumes itself to make only correct decisions it believes that it will never decide k. Therefore CA iff ¬k. Substituting ¬Cd for k yields CA iff (¬¬Cd iff Cd) (A is correct if and only if its decisions are correct).

Now assume that A decides k anyway (e.g. a cosmic ray causes a malfunction in its decision module). Since A assumes CA it follows that k must have been a correct decision (k → Ck). Substituting ¬Cd for k yields ¬Cd → C¬Cd, which is a contradiction, and in turn implies ¬CA (A is incorrect).


Tags: ,

In this post I try to fathom an informal definition of Self, the “essential qualities that constitute a person’s uniqueness”. I assume that the most important requirement for a definition of self is time-consistency. A reliable definition of identity needs to allow for time-consistent self-referencing, since any agent that is unable to identify itself over time will be prone to make inconsistent decisions.

Data Loss

Obviously most humans don’t want to die, but what does that mean? What is it that humans try to preserve when they sign up for Cryonics? It seems that an explanation must account and allow for some sort of data loss.

The Continuity of Consciousness

It can’t be about the continuity of consciousness as we would have to refuse general anesthesia due to the risk of “dying” and most of us will agree that there is something more important than the continuity of consciousness that makes us accept a general anesthesia when necessary.

Computation

If the continuity of consciousness isn’t the most important detail about the self then it very likely isn’t the continuity of computation either. Imagine that for some reason the process evoked when “we” act on our inputs under the control of an algorithm halts for a second and then continues otherwise unaffected, would we don’t mind to be alive ever after because we died when the computation halted? This doesn’t seem to be the case.

Static Algorithmic Descriptions

Although we are not partly software and partly hardware, we could, in theory, come up with an algorithmic description of the human machine, of our selfs. Might it be that algorithm that we care about? If we were to digitize our self we would end up with a description of our spatial parts, our self at a certain time. Yet we forget that all of us possess such an algorithmic description of our selfs and we’re already able back it up. It is our DNA.

Temporal Parts

Admittedly our DNA is the earliest version of our selfs, but if we don’t care about the temporal parts of our selfs but only about a static algorithmic description of a certain spatiotemporal position, then what’s wrong with that? It seems a lot, we stop caring about past reifications of our selfs, at some point our backups become obsolete and having to fall back on them would equal death. But what is it that we lost, what information is it that we value more than all of the previously mentioned possibilities? One might think that it must be our memories, the data that represents what we learnt and experienced. But even if this is the case, would it be a reasonable choice?

Indentity and Memory

Let’s just disregard the possibility that we often might not value our future selfs and so do not value our past selfs either for that we lost or gained important information, e.g. if we became religious or have been able to overcome religion.

If we had perfect memory and only ever improved upon our past knowledge and experiences we wouldn’t be able to do so for very long, at least not given our human body. The upper limit on the information that can be contained within a human body is 2.5072178×10^38 megabytes, if it was used as a perfect data storage. Given that we gather much more than 1 megabyte of information per year, it is foreseeable that if we equate our memories with our self we’ll die long before the heat death of the universe. We might overcome this by growing in size, by achieving a posthuman form, yet if we in turn also become much smarter we’ll also produce and gather more information. We are not alone either and the resources are limited. One way or the other we’ll die rather quickly.

Does this mean we shouldn’t even bother about the far future or is there maybe something else we value even more than our memories? After all we don’t really mind much if we forget what we have done a few years ago.

Time-Consistency and Self-Reference

It seems that there is something even more important than our causal history. I think that more than everything we care about our values and goals. Indeed, we value the preservation of our values. As long as we want the same we are the same. Our goal system seems to be the critical part of our implicit definition of self, that which we want to protect and preserve. Our values and goals seem to be the missing temporal parts that allow us to consistently refer to us, to identify our selfs at different spatiotempiral positions.

Using our values and goals as identifiers also resolves the problem of how we should treat copies of our self that are featuring alternating histories and memories, copies with different causal histories. Any agent that does feature a copy of our utility function ought to be incorporated into our decisions as an instance, as a reification of our selfs. We should identify with our utility-function regardless of its instantiation.

Stable Utility-Functions

To recapitulate, we can value our memories, the continuity of experience and even our DNA, but the only reliable marker for the self identity of goal-oriented agents seems to be a stable utility function. Rational agents with an identical utility function will to some extent converge to exhibit similar behavior and are therefore able to cooperate. We can more consistently identify with our values and goals than with our past and future memories, digitized backups or causal history.

But even if this is true there is one problem, humans might not exhibit goal-stability.


Tags: , , , , , ,

« Older entries § Newer entries »

Get Adobe Flash player