To beat humans you have to define “winning”

How could an artificial general intelligence manage to outsmart humans? It would either have to be programmed to do so or be programmed how to learn how to do so. In both cases it would need a very specific description of what constitutes improvement towards the goal and how to judge if a goal has been achieved. In other words, it will have to know what it means to win and therefore what exactly constitutes a mistake in order to learn from its mistakes. 

Consider Mathematica, a computational software program. Mathematica works as intended. It hits the narrow target space of human volition. Mathematica is in many aspects superhuman at doing mathematics yet falls far short of replacing human mathematicians.

Mathematica is not capable of replacing human mathematicians because it is not yet possible to formalize, in sufficient detail, what it would mean to be better at mathematics than humans.

Take chess as an example of a human activity at which software is now able to beat humans. The reason is not that humans did not evolve to play chess. Humans did neither evolve to do mathematics. The difference between chess and mathematics is that chess has a specific terminal goal in the form of a clear definition of what constitutes winning. Although mathematics has unambiguous rules there is no specific terminal goal and no clear definition of what constitutes winning.

The progress of the capability of artificial intelligence is not only related to whether humans have evolved for a certain skill or to how much computational resources it requires but also to how difficult it is to formalize the skill, its rules and what it means to succeed.

If you do not know what it is that you are supposed to do then you are unable to recognize if you have improved or committed a mistake.

If your aim is to accurately model language you might start with a model of word probabilities. But world probabilities are insufficient to beat humans at language. The exceptions and subtleties of language require new probabilistic models to capture capabilities such as emotional emphasis, recognizing context and meaning. What constitutes winning is becoming increasingly complex and wide-ranging as one approaches human level capabilities. Whereas the rules and objective of chess stay constant.

Consider the goal <build a house>. What exactly would be a mistake? Would thinking about it for a trillion years be mistaken? Would creating a virtual model of a house be a mistake? Any of the infinitely many possible interpretations of <build a house> has a different subset of instrumental goals. Which means that it is not clear what exactly is a mistake as long as you do not supply a very good description of what <build a house> means and what world states would constitute improvement.

To succeed at beating humans at any activity you have to hit a very narrow target space. Once it can be formalized what it takes to beat humans at a certain activity the resulting software will do exactly what it was intended to do, namely beating humans at that activity.

The important point here is that when it comes to software behaving as intended, and therefore safely, the goal <become superhuman good at mathematics> is in no relevant respect different from the goal <build a house>. Both goals require the programmer to supply a formalized description of their intention and thereby hit the narrow target of human volition.

As I wrote in my last post, any system that would mistake a description of <build a house> or <become superhuman good at mathematics> with <kill all humans> would never be able to kill all humans because it would make similar misinterpretations when it comes to solving problems in mathematics and physics, problems that are necessary to be solved in order to kill all humans.


People who claim that artificial general intelligence is going to constitute an existential risk implicitly conjecture that whoever is going to create such an AI will know perfectly well how to formalize capabilities such as <become superhuman good at mathematics> while at the same time they will fail selectively at making it solve the mathematics they want it to solve and instead cause it to solve the mathematics that is necessary to kill all humans.

If you claim that it is possible to define the capability <become superhuman good at mathematics> then you will need a very good argument in order to support the claim that at the same time it is difficult to define goals such as <build a house> without causing human extinction.

Tags: ,

  • Pingback: Alexander Kruel · MIRI/LessWrong Critiques: Index()

  • Pingback: Alexander Kruel · Reply to Stuart Armstrong on Dumb Superintelligence()

  • seahen

    If a computer solved the P=NP problem or the Riemann hypothesis tomorrow, or reduced Fermat’s Last Theorem or the Four-Colour Theorem to ten pages, I’d tend to say we humans had been beaten at math.

  • Xagor et Xavier

    That suggests a Mathematics Game:

    1. White devises a conjecture for Black to prove or disprove.

    2. Black proves/disproves this conjecture in time t.

    3. Black devises a conjecture for White to prove or disprove.

    4. White proves/disproves this conjecture in time t’.

    If t > t’, White wins. If t < t', Black wins, otherwise it's a tie. Set the granularity so that looking up a prior answer always gives time 0.

    It's a sort of formalization of what you're saying. I added the symmetry because otherwise, one of the players could just bury the other one in conjectures.

    It doesn't seem wrong to call a computer that would consistently win this game when playing against humans, as having beaten humans at math.

  • Pingback: Alexander Kruel · Distilling the “dumb superintelligence” argument()