You are currently browsing articles tagged self.

Background: Newcomb’s problem

Objective: The problem I am trying to highlight with this post is not the difficulty of predicting another agent accurately but (1) the problem of stating precisely what it is that Omega is predicting in the first place (2) that locating and isolating a discrete agent in a continuous universe by e.g. formalizing the boundaries of the physical system in question seems to be nontrivial for complex agents (3) how to think about decision making when decisions are determined not just by the agent (as arbitrarily defined by humans) but by the larger environment.

The ability to accurately predict the decision making of other agents is insufficient if it is not possible to define what is meant by <decision making> and <agent>.

Newcomb’s problem: Ignoring the problems mentioned above, one-boxing is the correct strategy, given that Omega is correct more then 50.05% of the time. Since,


If (prediction == One-box)

return (1000000+1000)


return 1000

Expected value: y = (1-x)*(1000000+1000)+x*1000 = -1000000x+1001000 where 0<x<1 is the probability of a correct prediction.


If (prediction == One-box)

return 1000000


return 0

Expected value: z = x(1000000)+(1-x)*0 = 1000000x where 0<x<1 is the probability of a correct prediction.

Two-boxing versus One-boxing:

y > z

-1000000x+1001000 > 1000000x

1001000 > 2000000x

1001000/2000000 > x

0.5005 > x

As long as the probability of a correct prediction is less than 50.05%, two-boxing has the larger expected value.

Consider the following scenarios:

(S1) Omega predicts that you will end up taking both boxes because, even though at some point you did precommit to one-boxing, you change your mind and take both boxes.

(S2) Omega predicts that you will end up taking both boxes because of a stroke causing brain damage.

(S3) Omega predicts that a sudden wind gust will cause you to stumble and topple over both boxes, even though you did precommit to taking only one box.

(S4) You make up one half of a split brain residing in the same body. You precommit to one-boxing while the other personality sharing the body with you chooses two-boxing. You have no control of the movement of the body except that you are the one who can talk.


If a body harboring two or more personalities with different precommitment strategies about Newcomb-like problems ends up taking both boxes then did all of the agents who reside in that body take two boxes or just the one that happened to control the body during the critical moment?

It seems possible to adopt a wide range of definitions of “agency” when trying to reason about and predict the behavior of other agents. It is possible to define an agent as a global or a local physical system, respectively slice of space. In other words, when examining a system it is possible to either act based on the assumption that the whole system is a coherent entity or to assign the quality of agency to arbitrary sub-procedures of the system and examine them in isolation.

For example, if Omega was to assign the quality of agency to a volume of space approximately the size of the human brain, would then a precommitment to one-boxing satisfy Omega’s condition to put $1,000,000 into box A? Would then a case of two-boxing as a result of e.g. brain damage caused by external factors be ignored?

So what is it that Omega predicts when your actions are ultimately the local behavior of a larger physical system we call the universe?

Can you formalize the difference between what it means to take both boxes due to (1) changing your mind for subtle reasons (e.g. reading a decision theory paper) (2) changing your mind for not so subtle reasons (e.g. brain damage) (3) because you are not in control of the the larger physical system (e.g. sudden strong wind causes you to stumble) or (4) because you do not control “your” body (e.g. multiple personality disorder)?

Tags: , ,

If you wonder, Robert was some dude playing the same game: Motocross Madness 2

Tags: , , ,

In this post I try to fathom an informal definition of Self, the “essential qualities that constitute a person’s uniqueness”. I assume that the most important requirement for a definition of self is time-consistency. A reliable definition of identity needs to allow for time-consistent self-referencing, since any agent that is unable to identify itself over time will be prone to make inconsistent decisions.

Data Loss

Obviously most humans don’t want to die, but what does that mean? What is it that humans try to preserve when they sign up for Cryonics? It seems that an explanation must account and allow for some sort of data loss.

The Continuity of Consciousness

It can’t be about the continuity of consciousness as we would have to refuse general anesthesia due to the risk of “dying” and most of us will agree that there is something more important than the continuity of consciousness that makes us accept a general anesthesia when necessary.


If the continuity of consciousness isn’t the most important detail about the self then it very likely isn’t the continuity of computation either. Imagine that for some reason the process evoked when “we” act on our inputs under the control of an algorithm halts for a second and then continues otherwise unaffected, would we don’t mind to be alive ever after because we died when the computation halted? This doesn’t seem to be the case.

Static Algorithmic Descriptions

Although we are not partly software and partly hardware, we could, in theory, come up with an algorithmic description of the human machine, of our selfs. Might it be that algorithm that we care about? If we were to digitize our self we would end up with a description of our spatial parts, our self at a certain time. Yet we forget that all of us possess such an algorithmic description of our selfs and we’re already able back it up. It is our DNA.

Temporal Parts

Admittedly our DNA is the earliest version of our selfs, but if we don’t care about the temporal parts of our selfs but only about a static algorithmic description of a certain spatiotemporal position, then what’s wrong with that? It seems a lot, we stop caring about past reifications of our selfs, at some point our backups become obsolete and having to fall back on them would equal death. But what is it that we lost, what information is it that we value more than all of the previously mentioned possibilities? One might think that it must be our memories, the data that represents what we learnt and experienced. But even if this is the case, would it be a reasonable choice?

Indentity and Memory

Let’s just disregard the possibility that we often might not value our future selfs and so do not value our past selfs either for that we lost or gained important information, e.g. if we became religious or have been able to overcome religion.

If we had perfect memory and only ever improved upon our past knowledge and experiences we wouldn’t be able to do so for very long, at least not given our human body. The upper limit on the information that can be contained within a human body is 2.5072178×10^38 megabytes, if it was used as a perfect data storage. Given that we gather much more than 1 megabyte of information per year, it is foreseeable that if we equate our memories with our self we’ll die long before the heat death of the universe. We might overcome this by growing in size, by achieving a posthuman form, yet if we in turn also become much smarter we’ll also produce and gather more information. We are not alone either and the resources are limited. One way or the other we’ll die rather quickly.

Does this mean we shouldn’t even bother about the far future or is there maybe something else we value even more than our memories? After all we don’t really mind much if we forget what we have done a few years ago.

Time-Consistency and Self-Reference

It seems that there is something even more important than our causal history. I think that more than everything we care about our values and goals. Indeed, we value the preservation of our values. As long as we want the same we are the same. Our goal system seems to be the critical part of our implicit definition of self, that which we want to protect and preserve. Our values and goals seem to be the missing temporal parts that allow us to consistently refer to us, to identify our selfs at different spatiotempiral positions.

Using our values and goals as identifiers also resolves the problem of how we should treat copies of our self that are featuring alternating histories and memories, copies with different causal histories. Any agent that does feature a copy of our utility function ought to be incorporated into our decisions as an instance, as a reification of our selfs. We should identify with our utility-function regardless of its instantiation.

Stable Utility-Functions

To recapitulate, we can value our memories, the continuity of experience and even our DNA, but the only reliable marker for the self identity of goal-oriented agents seems to be a stable utility function. Rational agents with an identical utility function will to some extent converge to exhibit similar behavior and are therefore able to cooperate. We can more consistently identify with our values and goals than with our past and future memories, digitized backups or causal history.

But even if this is true there is one problem, humans might not exhibit goal-stability.

Tags: , , , , , ,