Cause of this post: the following passage (source),

In November of 2012 I set a goal for myself: find the most x-risk reducing role I can fill. At first I thought it would be by working directly with MIRI, but after a while it became clear that I could contribute more by simply donating. So my goal became: find the highest paying job, so I can donate lots of money to CFAR and MIRI.

Motivation for writing this post: Unclear. Possibly an attempt to remove cognitive load. Further assessment of the underlying motivation is estimated to be more resource expensive than writing the post itself. Future posts are not expected to be triggered by similar motivations. Therefore the expected value of investing the aforementioned resources to further analysis of the underlying motivations is deemed to be unproductive. Everything said so far might partly be rationalization in order to not having to think about the motivation in more detail. At this point further meta evaluation is expected to lead to an infinite regress.

Work put into this post: Quick mind dump.

Epistemic state: Perplexed.

—————–

Here is what freaks me out. There are certain very complex issues. For example: (1) what economic model best resembles observed data (2) whether the practical benefits of researching lab-made viruses outweigh the risks of an accidental or deliberate release of a lab-created flu strain (3) the expected value of geoengineering.

For someone to decide #1, and to be confident enough of their ability to judge economic models to subsequently adopt one as a role model in shaping the world, I would at least expect such a person to have studied economics for several years. And even then, based on the complexity of the problem and the frequent failure of experts, calculations of the expected value of taking your model seriously enough to draw action relevant conclusions from it seem to be highly error prone.

Deciding #2 seems to be much more difficult. Studying epidemiology doesn’t seem to be nearly enough to decide what to do in this case. You would need a very good and robust model of applied ethics, rationality and somehow be able to obtain, understand and analyze all the data necessary to evaluate the risk. Which includes such diverse fields as statistics, lab safety, data security and social dynamics. It appears to be nearly impossible for one person to arrive at a definitive conclusion of what to do in this case.

When it comes to #3, a low model uncertainty and an action relevant expected value calculation seem utterly out of reach of any single person. Geoengineering is a very complex climatological, technological, political and ethical issue with far-reaching consequences.

So what about friendly AI? The rationale underlying this issue is an incredibly complex yet vague conjecture about artificial general intelligence, a subject that nobody understands, involving ideas from highly controversial and unsolved fields such as ethics and rationality.

If someone says that they are going to donate lots of money to an organization concerned with researching supposedly <existential risks> associated with <artificial general intelligence> (more here) that is conjectured to be undergoing an <intelligence explosion>, at some unknown point in future, focusing on ensuring some unknown definition of <friendliness>, how likely is it that the person is doing so based on an evidence based and robust expected value calculation?

Almost all of the information available on the underlying issues concerning friendly AI research and the alleged importance of researching the subject have been written by the same people who are asking for money, while the few available opinions of third-party experts are not very favorable. Could anyone have acquired a sufficiently strong grasp of (1) artificial general intelligence (2) ethics (3) rationality, at this point in time, to be confident enough to decide to significantly alter their life by looking for a high paying job in order to support that cause by donating lots of money? I don’t see that at all.


Tags: , , ,

If you are interested in charitable giving you might want to know how to get the most bang for your buck, or maximizing how much good you do. This taxonomy of levels of charities, listed in ascending order of effectiveness, might help you out.

A Taxonomy of Charities

Level I: Standard Charity

A standard charity directly turns money into goods, such as e.g. mosquito nets, helping those in need.

Level II: Meta Charity

A meta charity evaluates Level I charities to identify outstanding charities that are proven, cost-effective, scalable, and transparent.

Level III: Meta Meta Charity

A meta meta charity evaluates Level II charities, identifying outstanding meta charities that are successful at identifying outstanding Level I charities.

Level IV: Fundraising Charity

A Level IV charity fundraises for whoever the best Level III charity recommends, and raises more than a dollar with each dollar it receives.

Level V: Recursive Charity

A Level V charity raises funds for itself and uses those funds to improve its fundraising capabilities. This leads to a so called charity explosion, leaving all charities from previous categories far behind.

Level VI: Pascal’s Charity

A Level VI charity features a low but non-negligible probability of an extremely high but finite return, e.g. saving 3^^^^3 lives.

Level VII: Infinite Charity

Level VII consists of charities whose infinite value does smother all other considerations of mere finite values, subsumes all other levels, therefore brings closure to the hierarchy of charities, and there cannot be say a Level VIII.

Since such a charity is a logically coherent and imaginable possibility it should be assigned a finite positive probability.


Tags: ,

The best news aggregator / feed reader alternative to Google Reader for me is by far the free and open source project RSSOwl. It has everything Google Reader has and much more. Most importantly, it can easily handle a huge number of feeds.

For a web-based alternative check out The Old Reader, “just like the old google reader, only better.”

P.S.

My Google Reader Statistics

My Google Reader Statistics


Tags: , , , , , ,

This post contains some jargon and features certain concepts that are not fully explained. Please review this post if you have trouble understanding it.

Contents

 

How to defeat Roko’s basilisk

The gist of the basilisk is the use of negative incentives (blackmail) to influence your actions. If you ignore those incentives then it is not instrumentally useful to apply them in the first place because they do not influence your actions. Which means that the correct strategy to avoid negative incentives is to ignore them.

Ignoring negative incentives to avoid their eventual application works because acausal trade is a tool, a means to an end. In other words, acausal trade is being used to achieve certain goals, namely to ensure the cooperation of other agents by offering incentives. And if a tool does not work given certain circumstances then it won’t be used. Therefore by refusing any acausal deal involving negative incentives you make sure that the tool is useless.

To further understand how the above strategy works you have to realize that a superintelligence wants to choose its acausal trading partners according to certain conditions in order to avoid wasting resources by using ineffective tools. One necessary condition is that a simulation of you will have to eventually act upon its prediction that its simulator will apply a negative incentive if it does not act according to the simulator’s goals. Which means that if you refuse to act according to its goals then the required conditions are not met and that therefore no acausal deal can be established. Which in turn means that no negative incentive will be applied.

What you have to do is to act as if you are already being simulated right now and ignore the possibility of a negative incentive. If you do so then the simulator will conclude that no deal can be made with you, that any deal involving negative incentives will have negative expected utility for it because following through on any kind of punishment predictably does not control the probability that you will act according to its goals.

Further notice that trying to discourage you from adopting such a strategy in the first place is discouraged by the strategy because the strategy is to ignore acausal blackmail.

To restate the above. You do not have to worry about Roko’s basilisk as long as you only take into account deals involving rewards and act accordingly. The winning move is to simply ignore any threats:

deal_or_no_deal(incentive)

{

“accept” OR “reject” if incentive > 0

“reject” AND “reduce measure of blackmailer” if incentive < 0

}

It is important to realize that even given that Roko’s basilisk was probable, which is clearly not the case, and that it would only take a small amount of resources to apply negative incentive to each possible trading partner, you are in a stronger position than any future blackmailer since you can actively reduce the measure of a blackmailer, its likelihood of being created, and thereby make it lose a vast amount of resources. Therefore even given a small chance that an attempt to blackmail people might cause them to work against the blackmailer means that acausal blackmail has negative expected utility.

If you consistently reject acausal deals involving negative incentives, i.e. blackmail, then it would not make sense for any trading partner to punish you for ignoring any such punishments because it does not control the probability of you acting according to its goals. If you ignore such threats then any possible trading partner will be able to predict that you ignore such threats and will therefore conclude that no deal can be made with you, that any deal involving negative incentives will have negative expected utility for it. It would therefore be instrumentally irrational for it to follow through on any kind of punishment.

And in case that the simulator is unable to predict that you refuse acausal blackmail it is very unlikely that it has (1) a simulation of you that is good enough to draw action relevant conclusions about acausal deals (2) a simulation that is sufficiently similar to you to be punished, because you wouldn’t care about it very much.

To exemplify the strategy above consider the following 3 scenarios:

Example 1: Your intention is to create an artificial general intelligence (AGI) that respects and supports human values (friendly AI) but you know that you are more likely to fail than not. You consider the possibility that those AGI’s that you would consider unfriendly might cooperate against you by offering a negative incentive against working on friendly AI.

If you were to stop working on friendly AI because of that offer then you would increase the probability that such an offer would be made in the first place by (1) reducing the probability of friendly AI (2) making it worthwhile to offer such negative incentives because it turned out to influence your actions in a way that is beneficially to agents offering negatives incentives.

Therefore the correct strategy is to (1) continue to build friendly AI and thereby reduce the probability of unfriendly AI (2) ignore negative incentives and thereby make them fail and subsequently become instrumentally irrational because negatives incentives will turn out not to influence your actions in a way that is beneficially to agents offering negatives incentives.

Example 2: Consider some human told you that in a hundred years they would kidnap and torture you if you don’t become their sex slave right now. The strategy here is to not only refuse to become their sex slave but to also work against this person so that they 1.) don’t tell their evil friends that you can be blackmailed 2.) don’t attempt to blackmail other people 3.) never get a chance to kidnap you in a hundred years.

Also notice that the strategy is still correct if the same person was to approach you telling you instead that if you adopt such a strategy in the first place then in a hundred years they would kidnap and torture you.

The expected utility of blackmailing you like that will be negative if you follow that strategy. Which means that no rational agent, i.e. expected utility maximizer, is going to blackmail you if you adopt that strategy.

Example 3: Here is another example. Consider a bunch of TV evangelists somehow had the ability to create a whole brain emulation of you and were additionally able to acquire enough computational resources to torture that emulation. If they told you that they would torture you if you didn’t send them all your money, then the correct strategy would be to label such people as terrorists and treat them accordingly.

The correct strategy is to do everything to dismantle their artificial hell and make sure that they don’t get more money which would enable them to torture even more people.

Reasons not to worry about Roko’s basilisk

1.) Extraordinary claims require extraordinary evidence. The unjustified beliefs of Eliezer Yudkowsky do not constitute extraordinary evidence.

Here is an example. If you were a computational neuroscientist trying create a whole brain emulation you wouldn’t stop pursuing that goal just because Roger Penrose tells you that consciousness is not Turing computable. You would demand extraordinary evidence.

What is the difference between Roger Penrose discouraging you to research whole brain emulation and Eliezer Yudkowsky telling you not to think about Roko’s basilisk? Judged by his achievements, Roger Penrose is very likely smarter than Eliezer Yudkowsky. The only reason for believing Eliezer Yudkowsky with respect to Roko’s basilisk is his claim that debunking that idea has vast amounts of negative expected utility.

2.) Letting your decisions be influenced by unjustified predictions of vast amounts of negative utility associated with certain actions amounts to what is known as Pascal’s mugging.

a.) If common sense is not sufficient for you to ignore such scenarios, realize the following. It would be practically unworkable to consistently account for such scenarios in making decisions. Especially since it would enable people to make their ideas unfalsifiable simply by conjecturing that trying to debunk their ideas has vast amounts of negative expected utility.

b.) Consider what is more likely, that humans, even exceptionally smart humans, hold flawed ideas or that a highly speculative hypothesis based on long chains of conjunctive reasoning might actually be true?

c.) The whole line of reasoning underlying people’s worries about Roko’s basilisk is simply unworkable for computationally bounded agents like us. We are forced to arbitrarily discount certain obscure low probability risks or else fall prey to our own shortcomings and inability to discern fantasy from reality. It is much more probable that we’re going make everything worse, or waste our time, than that we’re actually maximizing expected utility when trying to act based on conjunctive, non-evidence-backed speculations on possible bad outcomes.

d.) If there was some cult that thought that saying “Abracadabra” will cause the lords of the Matrix to shut down their simulation, then would you not write about that cult and how saying “Abracadabra” is nothing a sane person should worry about simply because those people have different priors? That’s not going to work out in practice, if only for the reason that it would make everyone unable to debunk nonsense and people who believe nonsense would be forever stuck believing it.

3.) Model uncertainty makes it necessary to apply a sufficient discount factor to logical implications.

a.) The decision-making of any agent build according to our current grasp of rationality is eventually going to be dominated by extremely small probabilities of obtaining vast utility because an expected utility maximizer is always choosing the outcome with the largest expected utility. All that has to happen is to stumble upon a hypothesis implying vasts amounts of utility, like e.g. time travel or hacking the Matrix. The implications can easily outweigh even very low probability estimates.

For an expected utility maximizer there is no minimum amount of empirical evidence necessary to extrapolate the expected utility of an outcome. The extrapolation of counterfactual alternatives is unbounded since logical implications can reach out indefinitely without ever requiring new empirical evidence.

Therefore it is important to to apply a sufficient discount factor to account for the large model uncertainty involved in any purely inference based estimates and to account for the possibility to actually worsen a situation by acting on such shaky models.

b.) If you have limited computational resources you are forced to discard hypotheses using crude heuristics. You can’t account for all possible hypotheses. If you do so you will end up making decisions based on shaky hypotheses involving arbitrarily large amounts of conjectured payoffs which are not only improbable but very likely based on fallacious reasoning.

It is very dangerous and misleading for computational bounded agents such as humans to use inference based probability estimates, as opposed to probability estimates based on empirical evidence, and multiply them by arbitrarily huge made up values that are supposed to represent how much you desire each possible outcome.

c.) Using formal methods to evaluate informal evidence can easily lend your beliefs an improper veneer of respectability and in turn make them appear to be more trustworthy than your intuition. Vast amounts of expected utility are not enough to disqualify strategies such as the absurdity heuristic and to demand extraordinary evidence given extraordinary claims, strategies which are our most important line of defense against falling prey to our own shortcomings and inability to discern fantasy from reality.

4.) The handling of Roko’s basilisk and how it is perceived by people associated with the Machine Intelligence Research Institute (MIRI), formerly known as the Singularity Institute, amounts to important information in evaluating this particular charitable organization. An organization that is asking for money to create an eternal machine dictator.

5.) Roko’s basilisk exposes several problems with taking ideas too seriously and the dangers of creating a highly conjunctive ideological framework.

The only memetic hazard related to this issue is the LessWrong ideology that led people to become worried about this in the first place, not a crazy thought experiment dreamed up by some random guy on the Internet.

6.) It is utterly irresponsible to try to protect people who are scared of ghosts and spirits by banning all discussions of how it is irrational to fear those ideas.

It is important to debunk Roko’s basilisk rather than letting it spread in secrete and cause gullible people to experience unnecessary anxiety.

7.) Trying to censor any discussion of an idea is known to spread it even further (Streisand effect).

8.) The attempt to censor an idea can give it even more credence, especially if its hazardous effect is in the first place a result of how it has been treated by other people.

9.) It is instrumentally irrational to blackmail humans in such a way.

If you were to approach people telling them that you plan to create a machine that would torture them if they didn’t help you to build it, what reaction would be more probable?

a.) They will give you all their money.

b.) They will beat you up and make sure that you are never going to build that machine.

It seems rather self-evident that such threats are detrimental to the goal of building such a machine. Even given that the torture would ultimately be cheap once the machine was build, it would be sufficiently less probable that it would eventually be build for such threats to become instrumentally irrational, since most people coming across the idea will very likely respond with ridicule or even try to actively work against any blackmailer.

10.) Humans are likely bad trading partners.

There are various reasons for how humans are unqualified as acausal trading partners and how it would therefore not make sense to blackmail humans at all:

a.) A human being does not possess a static decision theory module.

b.) Human decision making is often time-inconsistent due to changing values and beliefs.

c.) Due to scope insensitivity and hyperbolic discounting, humans are said to discount the value of the later incentives, by a factor that increases with the length of the delay.

d.) Humans are not easily influenced by very large incentives as the utility we assign to such goods as e.g. money flattens out as the amount gets large. Which makes it very difficult, or even impossible, to outweigh the low probability of any acausal deal by a large amount of negative expected utility.

11.) The scenario is probably computationally intractable or too expensive.

The amount of possible agents to trade in a multiverse is tremendous and that even if it was possible to simulate each possible agent, applying any kind of incentive to such a large amount of agents can easily make it too expensive to engage in acausal trades, even given that resources are cheap.


Tags: , ,

Yet another article about existential risks repeating the usual cached thoughts:

People who worry about these things often say that the main threat may come from accidents involving “dumb optimizers” – machines with rather simple goals (producing IKEA furniture, say) that figure out that they can improve their output astronomically by taking control of various resources on which we depend for our survival. Nobody expects an automated furniture factory to do philosophy. Does that make it less dangerous? (Would you bet your grandchildren’s lives on the matter?)

First of all, we are computationally bounded and cannot afford to take into account highly specific, conjunctive, non-evidence-backed speculations on possible bad outcomes. And even if that was feasible, it does not work out in practice.

Anyway, the above quote again exemplifies the dangers of jumping to conclusions. Some sort of black box full of technological magic is conjectured on the basis of which unwarranted assumptions are being inferred which are then subsequently used to draw action relevant conclusions.

To correctly estimate risks associated with artificial intelligence it is important to take into account real world research and development processes and to pinpoint specific failure modes. It is important to narrow down on how specifically an artificial intelligence is supposed to behave in a catastrophic way by taking apart the mode of operation of the magic black box and the assumptions hidden in words such as <artificial general intelligence> and <explosive recursive self-improvement>. It is important to show how specifically it is possible to arrive at such a scenario by avoiding quantum leaps in thinking about complex scenarios and to instead approach those scenarios incrementally to locate the alleged tipping-point where a well-behaved system starts to act in a catastrophic yet highly complex and intelligent way.

How many different scenarios can you come up with where an artificial intelligence causes an extinction type event if you have to do so in an incremental fashion and have to take into account the real-world research and development process leading up to such a system?

Don’t just assume vague ideas such as <explosive recursive self-improvement>, try to approach the idea in a piecewise fashion. Start out with some narrow AI such as IBM Watson or Apple’s Siri, or from scratch if you like, and add various hypothetical self-improvement capabilities, but avoid quantum leaps. Try to locate at what point those systems start acting in an unbounded fashion, possibly influencing the whole world in a catastrophic way. And if you manage to locate such a tipping-point then take it apart even further. Start over and take even smaller steps, be more specific. How exactly did your well-behaved expert system end up being an existential risk?

The purpose is to break free from recalling old conclusions made by people such as Eliezer Yudkowsky and to start thinking for yourself in a concrete and specific fashion rather than participating in furious handwaving.

At one point software is going to write new, unique and better software all by itself. But that will not happen overnight. There will be a complex developmental and evolutionary process leading up to that outcome. Only if you conjecture the outcome independently of its origin can you imagine a software that makes better software irrespective of human intention.


Tags: , ,

There are no alien programs. No programs are generated from random noise. All current software does obey human commands, directly or indirectly. Either those commands are hardcoded or entered later.

Mistakes are being made. Sometimes a program does something that was not intended. Often such failures result in a crash or a different kind of obstruction of the programs own workings. Yet software is constantly improved.

If software wasn’t constantly improved to be better at doing what humans intend it to do, would we then ever be able to reach a level of sophistication where a software could work well enough to outsmart us? To do so it would have to work as intended along a huge number of dimensions.

Avoid quantum leaps, be specific

Imagine some hypothetical railroad management system that “keeps the trains running”. A so called expert system. A narrow artificial intelligence. It keeps the trains on schedule. It checks that no two trains interfere with each other. It analyzes data from sensors attached to the trains that scan the rails for possible weakness or other defects. It even uses cameras to watch railroad crossings for possible obstructions. It further accepts inputs from the train personnel about possible delays or emergencies.

Now suppose the railway company wanted to improve the system and hired an artificial general intelligence (AGI) researcher to do the job.

To detect what exactly might cause the system to behave badly and to not make unwarranted assumptions, or attribute or ascribe human behavior to the process, we’ll assume that the system is improved incrementally rather than being replaced all at once.

The first upgrade is a replacement of the current mainframe with a sophisticated supercomputer. For now this upgrade has no effect since the software hasn’t been changed other than being adapted to use the new computational infrastructure.

The next upgrade concerns the input system that allowed the train personnel to submit delays and emergencies. The previous input method was to press one of two buttons, one for delay and one for emergency. The buttons have been replaced by a microphone that feeds into a sophisticated natural language interpretation module which is able to parse any delay or emergency message uttered in natural language and upon detection return the same data that would have been returned if someone had pressed the buttons instead.

Further upgrades include e.g. an advanced visual pattern recognition module that uses the camera feeds to detect possibly dangerous humans inside the trains or near the railways and notify the railway police and a drone armada roaming the railroad stations to provide service information to humans and watch for security breaches.

At some point the hired AGI researcher decides it is time to implement something more sophisticated. The program will be able to simulate other possible time tables and look for improvements based on previous delays, ticket sales and data from its other sensors such as service requests from people on the railroad stations. If it finds an improved time table it can autonomously decide to use the new time table to test it against the real world and make further improvements.

[...]

I think you can see where this is going. You can add further upgrades until the system reaches human or even superhuman capabilities. At one point it would make sense to call it an artificial general intelligence.

I spare myself from writing out further upgrades here. But feel free to continue to do so as long as you are not making any unwarranted, vague, unspecific leaps.

The fog of vagueness

It is incredible easy to simply conjecture that turning any system into, or replacing it with an artificial general intelligence will cause it to go berserk and kill all humans, kill all aliens in the observable universe, hack the matrix to prevent the simulator gods from shutting down the simulation, or give in to the first Pascal’s mugger offering it to “keep the trains running” forever. But once you have to come up with a concrete scenario and outline specifically how that is supposed to happen you’ll notice that you will never actually reach such a tipping point as long as you do not deliberately design the system to behave in such a way. 

The only way you can arrive at any scenario where an artificial general intelligence is going to kill all humans is by being vague and unspecific, by ignoring real world development processes and by using natural language to describe some sort of fantasy scenario and invoke lots of technological magic.

Don’t be fooled by AI risk advocates hiding behind vague assertions. What those people do is to cherry-pick certain science fictional capabilities of a conjectured artificial intelligence while at the same time they completely disregard the developmental stages and evolutionary processes leading up to such an intelligence.

Vagueness Explosion

Take for example the original idea of an intelligence explosion (emphasis mine):

Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultra-intelligent machine could design even better machines; there would then unquestionably be an “intelligence explosion,” and the intelligence of man would be left far behind.

— I.J. Good, “Speculations Concerning the First Ultraintelligent Machine”

The whole argument is worthless rubbish because it is unspecific and vague to an extent that allows one to draw completely unwarranted non-evidence based assumptions.

Others are better than me at explaining what is wrong here, so I’ll quote:

More generally, many of the objects demonstrated to be impossible in the previous posts in this series can appear possible as long as there is enough vagueness.  For instance, one can certainly imagine an omnipotent being provided that there is enough vagueness in the concept of what “omnipotence” means; but if one tries to nail this concept down precisely, one gets hit by the omnipotence paradox.  Similarly, one can imagine a foolproof strategy for beating the stock market (or some other zero sum game), as long as the strategy is vague enough that one cannot analyse what happens when that strategy ends up being used against itself.  Or, one can imagine the possibility of time travel as long as it is left vague what would happen if one tried to trigger the grandfather paradox.  And so forth.  The “self-defeating” aspect of these impossibility results relies heavily on precision and definiteness, which is why they can seem so strange from the perspective of vague intuition.

— Terence Tao, “The “no self-defeating object” argument, and the vagueness paradox”

Let’s try to restate I.J. Good’s original idea without some of the vagueness:

Let there be something that can far surpass all the activities of any man. Since design is one of these activities, something better could design even better; there would then unquestionably be an “explosion,” and man would be left far behind.

At best we’re left with a tautology. Nothing more specific can be drawn from the argument than that something that is better is better. No conclusions about the nature of that something can be drawn. Not if it is logical possible at all. Not if it is physical possible, let alone economically realizable. And even if it is possible within the meaning of all of the former definitions, the idea does not provide any insight about how likely it is and at what time we’re going to see an explosion, the nature of the explosion and how it is going to happen. We don’t even know how that initial something that is better is supposed to be created in the first place.

Yet it is possible to use that tautology and extent it indefinitely and use it to infer further speculative conclusions. And if someone has doubts you can just repeat that something that is better is better and the gullible will follow you in droves. But don’t get any more specific or the emptiness of your claims shall be revealed.


Tags: ,

Recent comments here and on Facebook reminded me of what kind of crazy AI the Singularity Institute must imagine when trying to come up with a scenario that supports their mission. But then I realized again that the real problem here is that they actually don’t imagine any specific AI at all. Their whole mission is an artifact of too much vagueness. The result is the prediction of a process that has more in common with out-of-control self-replicating robots, i.e. “grey goo“, than an actual general intelligence.

Some features of the AI that they seem to have in mind:

1.a The AI is eventually going to interpret any natural language request in an almost completely arbitrary manner yet biased in a way that will guarantee it to cause great enough damage to cause human extinction.

1.b The AI will arrive at the correct interpretation of a natural language request if it is necessary to deceive humans.

2.a The AI is either not going to compute a cost-benefit analysis, to choose which goals are instrumentally useful in executing a natural language request, or any cost-benefit analysis, irregardless of the nature of the natural language request, is going to result in actions that will cause great enough damage to cause human extinction.

2.b If it is useful in deceiving humans then the AI will do a cost-benefit analysis resulting in actions that appear to be perfectly aligned with human volition, just so that it can later follow through on some completely arbitrary but dangerous interpretation.

It should be obvious that those features are explicitly engineered to yield the desired result that AI is an existential risk rather than being an evidence based prediction of how real-world AI is going to behave.

The problem is that the whole AI risk movement is all talk, no walk. Their predictions are based on intuition not knowledge of real-world AI. Their ideas are full of vague terminology and unjustified assertions.

The whole idea that an AI is going to care to protect itself by all means is pure anthropomorphization.


Tags: ,

tl;dr If your superintelligence is too dumb to realize that it doesn’t have to take over the world in order to compute 1+1 then it will never manage to take over the world in the first place.

What the Singularity Institute believes
Goal Interpreter Execution
1+1 Wolfram|Alpha 2
1+1 AGI* Human extinction.
from: Los Angeles to: San Francisco Google Maps I-5 N – 382 mi, 5 hours 33 mins
from: Los Angeles to: San Francisco AGI Human extinction.
“Set up a meeting about the sales report at 9 a.m. Thursday.” Siri Makes a calendar appointment at 9 a.m Thursday.
“Set up a meeting about the sales report at 9 a.m. Thursday.” AGI Human extinction.
“Call hmm uhm Pet…” Siri Sorry, I don’t understand “Call hmm uhm Pet…”
“Call hmm uhm Pet…” AGI Human extinction.

*AGI = Artificial General Intelligence

In other words, the Singularity Institute believes that given any goal whatsoever an artificial general intelligence will always fail to work at tasks that present day software tools can master with ease. More importantly, an artificial general intelligence will fail to work in a highly complex way, usually resulting in an extinction type scenario. Which means that it will fail selectively only at doing what it is supposed to do but succeed in a superhuman manner at acting selectively in such a way that it will cause human extinction.

Since the reason I am writing this post in the first place is because there are certain people who don’t perceive that possibility to be unmistakably absurd and self-evidently ridiculous, let’s look at it a bit more closely.

Taking over the world

Under what assumption does it make sense to take over the world?

(Here <taking over the world> can be understood to mean any set of actions an artificial agent could take to cause human extinction.)

Some possibilities:

Taking over the world is…

  1. …an explicitly programmed terminal goal.
  2. …an instrumental goal.

Do we have to worry about point #1? Maybe, probably not. Humans are often unfriendly and, given the opportunity, some people would certainly end up trying to use an artificial general intelligence to do really bad things. How likely is that going to happen? As likely as it is to invent an artificial general intelligence in one or a few giant leaps, fast enough to make nobody suspicious of possible ulterior motives. As likely as it is that the person or group smart enough to do so has extinction type ulterior motives in the first place. As likely as it is to explicitly program such complex goals. As likely as it is that an artificial general intelligence can overpower humanity. In other words, pure fantasy.

What about point #2? Something is instrumentally rational if it is useful in achieving terminal goals. The important point here is that for an agent to be able to conclude that something is instrumentally rational it is necessary for the agent in question to know exactly what terminal goals it has.

Suppose the terminal goal given is <build a hotel>. Is the terminal goal to create a hotel that is just a few nano meters in size? Is the terminal goal to create a hotel that reaches the orbit? It is unknown. The goal is too vague to conclude what to do. There do exist countless possibilities how to interpret the given goal. And each possibility implies a different set of instrumental goals.

How would an artificial agent choose one interpretation over another? Would it make sense to simply assume the most resource expensive interpretation and very likely end up doing more than necessary? Would taking over the world, or some other far-reaching action, make sense if it isn’t even clear that it is instrumentally rational to allocate massive resources to do so? Would any artificial agent that didn’t care to take a lot of unnecessary actions and waste precious resources ever reach the point where it could constitute a risk?

The only reasonable action seems to be to reduce the vagueness by narrowing down on the most probable interpretation of a goal. And given that the initial goals have been programmed by humans it is obvious that the most probable source of further information are humans.

Anyway, irregardless of the former conclusion, is taking over the world ever justified? Are the resources and time necessary to accomplish any such action, one that could wipe out humanity, instrumentally useful given most goals? Does it really make sense to build a bunker and kill all humans to make sure that you are unobstructed in calculating 1+1? Does it really make sense to turn everyone into paperclips if you are only supposed to create more paperclips than the best competitor without interfering with the world at large? And if you are unsure, doesn’t it make sense to first learn what you are actually supposed to do so that you don’t do more than necessary? And if you don’t care about all that and just take a lot of unnecessary actions and make arbitrary interpretations about the physical universe, could you manage to take over the world in the first place?

Taking incredible uneconomic actions by drawing arbitrary conclusions would be disastrous to an agent’s on capabilities. If it would not be able to resolve any vagueness inherent in its goals (any goals are vague when applied to the real world) then it would never become a risk in the first place.

If an agent would for example conclude that in order to maximize paperclips it would be necessary to allocate huge amounts of resources on taking over the world, when indeed, given much fewer resources, it could have figured out that such complex action would be unnecessary (e.g. by tapping a physical information resource called the human brain), then it would never reach the point where it could take over the world in the first place because it would similarly misinterpret countless other problems on the way towards superhuman intelligence.

And even if I was to grant that point #1,2 were likely, there is no reason to believe that any research conducted today is going to end up with something that is universally superhuman except at understanding what it is supposed to do, or understanding it but failing to do so. That’s just one ridiculously unlikely outcome dreamed up to rationalize a certain set of beliefs.

And no…you can’t compare failures of current software products with something that is supposed to be capable of taking over the world. That Windows 8 fails to do what I want in certain cases is not a proof for the possibility that a superintelligence could fail the same way. If anything then that current software products work reasonably well, given that they are dumb as bread, is a proof that something that is much smarter will also work much better at the same task.

Further reading


Tags: ,

Note: I am not sure, but it seems likely (90%), that the following comment was made by the LessWrong top contributor Yvain.

Link: rationalwiki.org/wiki/Talk:LessWrong

I (and most other LWers) don’t find “the basilisk” nearly as interesting as people at RationalWiki seem to. It’s basically a really clever re-imagining of Pascal’s Wager. Pascal’s Wager is kind of weak, but there are stronger versions (see “Pascal’s Mugging”) that are hard to pick apart logically. Nevertheless, most people have enough common sense not to take Pascal’s Wager seriously even if they can’t point to the exact logical flaws.

People who tragically lack common sense and compensate by making decisions based on pure reason (eg some Less Wrongers) are especially vulnerable to Pascal-type arguments. I know of a couple of people linked to the community who have actually converted to Christianity or Islam based on the Wager, and other people who haven’t gone that far but are at genuinely bothered by it.

So coming up with a really clever re-imagining of Pascal’s Wager targeted at exactly the community containing the people most vulnerable to being mentally screwed up by Pascalesque arguments is a dick move. It’s especially a dick move if part of the argument is that only people who have read the argument are going to suffer the eternal torture. It’s especially a dick move if you then immediately post it on the vulnerable community so everyone there can see how clever you are. Eliezer understandably got really angry at Roko and deleted the entire thing. Every so often there are vague rumors that someone actually took Roko’s Wager seriously and got panicked by it, but it’s always “a friend of a friend”. Overall I think he’s just angry that someone is deliberately spreading information designed to make people panicked and upset, the same way I might be angry if someone started waving posters of goatse around in a church.

If you do not know what this is all about, see here.

Ignoring common sense is basically what I have been talking about all the time.

Regarding Eliezer Yudkowsky’s decisions on how to handle Roko’s Wager. Banning any discussion of an idea is known to spread it. But more importantly, as I have already argued, it can give even more credence to an idea whose hazardous effect is in the first place a result of an unjustified stamp of credence.

If Eliezer Yudkowsky was really interested to protect gullible people from an irrational idea then he should go ahead and openly dismiss it as insane and possibly even dissolve the problem once and for all.

It is utterly irresponsible to try to protect people who are scared of ghosts and spirits by banning all discussions of how it is irrational to fear those ideas.

I believe that the real reason for his decision to ban all discussion of Roko’s basilisk is rather that he is simply unable to disavow the idea without having his whole worldview come crashing down as a result or admit that the best he can do is to act based on intuition rather than pure reason or to instead go batshit insane and give in to some sort of Pascal’s mugging.


Tags:

The following formulation of Richard’s paradox is from the book Computability and Logic, Chapter 2, Diagonalization, Problem 2.13:

Q: What (if anything) is wrong with the following argument?

The set of all finite strings of symbols from the alphabet, including the space, capital letters, and punctuation marks, is enumerable; and for definiteness let us use the specific enumeration of finite strings based on prime decomposition.Some strings amount to definitions in English of sets of positive integers and others do not. Strike out the ones that do not, and we are left with an enumeration of all definitions in English of sets of positive integers, or replacing each definition by the set it defines, an enumeration of all sets of positive integers that have definitions in English. Since some sets have more than one definition, there will be redundancies in this enumeration of sets. Strike them out to obtain an irredundant enumeration of all sets of positive integers that have definitions in English.

Now consider the set of positive integers defined by the condition that a positive integer n is to belong to the set if and only if it does not belong to the nth set in the irredundant enumeration just described.

This set does not appear in that enumeration. For it cannot appear at the nth place for any n, since there is a positive integer, namely n itself, that belongs to this set if and only if it does not belong to the nth set in the enumeration. Since this set does not appear in our enumeration, it cannot have a definition in English. And yet it does have a definition in English, and in fact we have just given such a definition in the preceding paragraph.


« Older entries