How does a consequentialist AI work?

Below are some features of the kind of artificial general intelligence (short: AI) that people use as a model to infer that artificial general intelligence constitutes an existential risk:

  • It will want to self-improve
  • It will want to be rational
  • It will try to preserve their utility functions
  • It will try to prevent counterfeit utility
  • It will be self-protective
  • It will want to acquire resources and use them efficiently

In short, they imagine a consequentialist expected utility maximizer.

Can we say anything specific about how such an AI could work in practice? And if we are unable to approximate a practical version of such an AI, is it then sensible to use it as a model to make predictions about the behavior of practical AI’s?

A goal that is often used in such a context is <maximize paperclips>. How would an AI with the above mentioned features act given such a goal?

  • What would be its first action?
  • How long would it reason about its first action?
  • Is reasoning itself an action? If so, how long would it reason about (1) how to reason and (2) for how long to reason about reasoning…?
  • How would it deal with low probability possibilities such as (1) aliens that might try to destroy it, (2) time travel or (3) that this universe is being simulated and that the expected value of hacking the simulation does outweigh the low probability of success due to an enormous amount of resources that is conjectured to be available in the higher level universe in order to be capable of simulating this universe?

All of those questions can be answered by suggesting certain bounds and limitations. But if it is possible to limit such an AI in such a way as to make it disregard certain possibilities and to limit its planning horizon, or the expense of computational resources it uses, then how is it any harder to prevent it from causing human extinction? And if such bounds are not possible then how could it work at all? And if it does not work then how are the actions of such an AI decision relevant for humans with respect to risks associated with practical AI?

The existence of human intelligence does not support the possibility that anything resembling a consequentialist AI is practically possible:

(1) Humans are equipped by evolution with complex drives such as boredom or weariness, emotions such as fear or anger and bodily feedback such as pain and tiredness that, most of the time, save them from falling into any of the above traps that afflict expected utility maximizers.

(2) Humans do not maximize expected utility expect in a few very limited circumstances. Humans have no static utility-function and are therefore time-inconsistent.

There are certain models such as AIXI, which proves that there is a general theory of intelligence. But AIXI is as far from real world human-level general intelligence as an abstract notion of a Turing machine with an infinite tape is from a supercomputer with the computational capacity of the human brain. An abstract notion of intelligence does not get you anywhere in terms of real-world general intelligence. Just as you won’t be able to upload yourself to a non-biological substrate because you showed that in some abstract sense you can simulate every physical process.


Practically unfeasible models of artificial general intelligence are very unreliable sources to be used to reason about the behavior of eventually achievable practical versions of artificial general intelligence.

Tags: ,

Comments are now closed.