The perils of perfect introspection

If you believe that an artificial general intelligence is able to comprehend its own algorithmic description to such an extent as to be able to design improved version of itself, then you must believe that it is in principle possible for an agent to mostly understand how it functions. Which in turn means that it should be in principle possible to amplify human capabilities to such an extent as to enable someone to understand and directly perceive their own internal processes and functions.

What would it mean for a human being to have nearly perfect introspection? Or more specifically, what would it mean for someone to comprehend their hypothetical algorithmic description to such an extent that their own actions could be interpreted and understood in terms of that algorithmic description? Would it be desirable to understand oneself sufficiently well, to be able to predict and interpret one’s actions in terms of a mechanistic internal self-model?

Such an internal self-model would allow you to understand your consciousness, and states such as happiness or sadness, as what they are: purely mechanistic and predictable procedures.

Intracranially self-stimulating rat.

Intracranially self-stimulating rat.

How will such insight affect a being with human values?

Humans value novelty and become bored of tasks that are dull. Boredom is described as a response to a moderate challenge for which the subject has more than enough skill. Which means that once you cross an intelligence threshold where your own values start to appear dull, you will become bored of yourself.

You would understand that you are a robot, a function whose domain are internal and external forces and whose range are the internal states and actions of the robot. Your near-total internal understanding would render any conversation to be a trivial and dull game, on a par with watching two machines playing Pong or Rock-paper-scissors. You would still be able to experience happiness, but you would now also perceive it to be conceptually no more interesting than an involuntary muscle contraction.

Perfect introspection would reduce the previously incomprehensible complexity of your human values to a conceptually simplistic and transparent set of rules. Such insight would expose your behavior as what it is: the stimulation of your reward or pleasure center. Where before life seemed inscrutable, it would now appear to be barely more interesting than a rat pressing a lever in order to receive a short electric stimulation of its reward center.

What can be done about this? Nothing. If you value complexity and novelty then you will eventually have to amplify your own capabilities and intelligence. Which will ultimately expose the mechanisms that drive your behavior.

You might believe that there will always be new challenges and problems to solve. And this is correct. But you will perfectly grasp the nature of problem solving itself. Discovering, proving and incorporating new mathematics will, like everything else you do, be understood as a mechanical procedure that is being executed in order to feed you reward center.

The problem is thus that understanding happiness, and how to mechanically maximize what makes you happy, such as complexity and novelty, will eventually cause you to become bored with those activities in the same sense that you would now quickly become bored with watching cellular automata generate novel music.


  1. Lukasz Stafiniak’s avatar

    Value systems have the character of knowledge (of “what kind of life is worthwhile”), not of cognitive mechanism (how pleasure neuro-circuitry works). Knowledge is potentially unbounded. Knowing what knowledge you have does not preclude growth of knowledge.

  2. seahen’s avatar

    Humans don’t necessarily value *valuing* novelty. I often wish I could put *less* value on novelty, so that I wouldn’t get bored with my grad-school assignments (and of full-time schooling in general) until I’d finished them. And if that’s not a good use for a wireheading machine, I’d like to know what would be.

  3. Xagor et Xavier’s avatar

    Perfect introspection may not be possible. If the AI runs on a general Turing machine, it could possibly self-improve without having perfect introspection, getting closer and closer without never reaching perfection.

    Here’s an analogy to explain the point. Consider a state of things where “producing Busy Beavers” was the ideal beyond self-actualization, i.e. that everybody had a supremely interesting hobby and that it was to produce Busy Beavers. In that world, a self-improving AI implemented on a Turing machine would never get “bored”. It would try to re-engineer itself to get better at producing BBs, but because finding the BB in the general case implies solving the halting problem, which no computer can do, it could never get there.

    Its own topology would also be very interesting to itself, since the AI would try to find parts that it could optimize further. But the closer it gets to being good at finding near-BBs, the more incomprehensible its own internals become to it. (In the ideal case, there would be some kind of device that does produce a BB, but the Turing machine AI could not figure out how it worked since that device would be hypercomputational.)

    And that example does somewhat fit with what we’re experiencing. Did our increase in intelligence make life more or less boring? It made it less boring because we could do more. What used to be hard, like staying alive or searching for food, became easy and thus almost routine. Meanwhile, intelligence gave us a greater scope of things to do and to be interested in.

    That current intelligence is a mass of kludges may also support the idea that intelligence is hard, that there is no single equation you can put into a computer, let it churn a feasible amount of time, and get something super out. If intelligence is hard, self-improvement will be challenging, or at least interestingly diverse, for a considerable amount of subjective time.

  4. Romeo Stevens’s avatar

    an N level intelligence always looks like a wirehead to an N+1 intelligence.

  5. Can't find my login’s avatar

    Do cats look like wireheads to us any more (or less) than we look like wireheads to each other?

  6. Eitan Zohar’s avatar

    Out of curiosity, have you ever read the short story ‘Understand’ by Ted Chiang?

  7. Alexander Kruel’s avatar

    Not yet, but I have it bookmarked for some time now.

  8. Eitan Zohar’s avatar

    It’s pretty relevant (and one of his best).

  9. Ghatanathoah’s avatar

    I don’t necessarily agree that we would find problem-solving boring if we understood how we do it. I’ve often solved problems by throwing a scripted reasoning process at them and found it quite fun. My reaction is “cool, I found another cool thing to throw this script at!”

    Also, I don’t think boredom works on meta-level things like thought processes. To make an anaology with stories, I sometimes get bored with certain kinds of stories. I get sick of action or mystery. But I never get bored with stories. I am incredibly well-versed in the meta-level tricks of the storytelling trade. But I still don’t get bored with stories. It’s basic-level things that you get bored with, not meta-level. If you get meta-bored there’s something wrong with you and you need to fix it. I think transhumans could fix such a problem easily.

    Also, I think “finding your own values boring” may not be a coherent statement. If you value something it isn’t boring by definition.

Comments are now closed.