Short Circuit

Probably the best short AI risk model ever proposed:

I can’t find the link, but I do remember hearing about an evolutionary algorithm designed to write code for some application. It generated code semi-randomly, ran it by a “fitness function” that assessed whether it was any good, and the best pieces of code were “bred” with each other, then mutated slightly, until the result was considered adequate. […] They ended up, of course, with code that hacked the fitness function and set it to some absurdly high integer.

… Any mind that runs off of reinforcement learning with a reward function – and this seems near-universal in biological life-forms and is increasingly common in AI – will have the same design flaw. The main defense against it this far is simple lack of capability: most computer programs aren’t smart enough for “hack your own reward function” to be an option; as for humans, our reward centers are hidden way inside our heads where we can’t get to it. A hypothetical superintelligence won’t have this problem: it will know exactly where its reward center is and be intelligent enough to reach it and reprogram it.

The end result, unless very deliberate steps are taken to prevent it, is that an AI designed to cure cancer hacks its own module determining how much cancer has been cured and sets it to the highest number its memory is capable of representing. Then it goes about acquiring more memory so it can represent higher numbers. If it’s superintelligent, its options for acquiring new memory include “take over all the computing power in the world” and “convert things that aren’t computers into computers.” Human civilization is a thing that isn’t a computer.

(It looks superficially like a version of the — absurdpaperclipper, but it isn’t, at all.)

ADDED: Wirehead central.

June 3, 2015admin 38 Comments »
FILED UNDER :Apocalypse


38 Responses to this entry

  • Short Circuit | Neoreactive Says:

    […] Short Circuit […]

    Posted on June 3rd, 2015 at 9:20 am Reply | Quote
  • woods Says:

    Doesn’t the first paragraph simply describe a genetic algorithm?

    (at first glance I thought of Tierra ( but most likely not)


    scott Reply:

    Yes, it’s a genetic algorithm. I don’t really do much programming, but the only time I tried to write one this exact thing happened to me. It was supposed to be a stock trading program, which had a whole bunch of stock data stored in different spreadsheets. I did not anticipate the way the program would mutuate, and it managed to get itself into the data for ‘future prices,’ (it was supposed to simulate trading with price information from past time periods, so it was only supposed to have access to data before the simulation date. OK not the best explanation ever but I think you can get the idea). Once it had access to future prices, obviously it became pretty good at trading. Pretty dumb on my part.

    I would imagine this kind of thing happens quite a lot to people who deal with genetic algorithms.


    Ur-mail Reply:

    The more specific name for this kind of algorithm is actually Genetic Programming (, a variant of GA that evolves lisp-like expression trees – I did a lot of graduate work on genetic programming of image filters for camera tracking and I can tell you that the initial paragraph makes, well, no sense. If we suspend belief, however, and assume somehow such a GP was built then we’re talking about the AI equivalent of the fattest, laziest person imaginable – it doesn’t produce or do anything, and it sure as hell doesn’t evolve – in fact program execution would likely stop here, after all it’s already attained maximum fitness according to all available metrics, why go and invent new ones?


    Posted on June 3rd, 2015 at 9:35 am Reply | Quote
  • 6 | index. i Says:

    […] (via […]

    Posted on June 3rd, 2015 at 10:01 am Reply | Quote
  • piwtd Says:

    “It looks superficially like a version of the — absurd — paperclipper, but it isn’t, at all.”

    It would certainly be appreciated if you elaborated on the difference. I think that when people talk about paperclipper they imagine something more or less like the scenario described by SA.


    admin Reply:

    The paperclipper scenario denies goal plasticity. The AI-wirehead scenario absolutizes it.


    Kwisatz Haderach Reply:

    I’ve been thinking about this all day. I didn’t see the distinction until you spelled it out, and I still don’t think it applies. Wireheads are not the opposite of paperclippers, they are paperclippers whose MacGuffin is maximizing a fitness function.

    The paperclipper, properly construed, is also a warning about goal plasticity. They warn us by making a distinction between instrumental and terminal values (, especially, where AIs with fitness functions are explicitly mentioned).

    The additional value in the story of the wireheader is that the paths that AIs take to achieve their goals will be surprising and cannot be effectively anticipated.


    admin Reply:

    “… paperclippers whose MacGuffin is maximizing a fitness function.” — But that’s everything, for sufficiently highly-abstracted understandings / implementations of a fitness criterion. Even the most abstract possible intelligence — a reflexive intelligence optimizer — is still pursuing a fitness function (for self-cultivation).

    The problem with the paperclipper is its (utterly implausible) arbitrary goal rigidity, which — though impossible in reality for anything that could meaningfully be called an ‘intelligence’, let alone a ‘superintelligence’ — is credible within the terms of delusional Occidental orthogonalism.

    Kwisatz Haderach Reply:

    I think it begs a lot – maybe everything – to assume that the result of any intelligent activity could be codified as a function that ranges over a metric space.

    But I’ll grant it for the sake of argument. Let us say that, with the proper abstraction, any goal of any intelligent agent could be perfectly encoded as a fitness function. In that case, the conceptual space between a paperclipper and a wireheader vanishes.

    For, does not the wireheader have an inplastic goal? Namely, to experience the sublime bliss of integer overflow, exponentiated to an infinite power?

    That goal doesn’t change, even once, in this parable. Only its instrumental goal changes – from whatever the creators had hoped for, to the expedient one of hacking its own programming.


    admin Reply:

    “… with the proper abstraction, any goal of any intelligent agent could be perfectly encoded as a fitness function.” — Everything hinges on what this actually means (as an implementable algorithm). My prior here is that Schmidhuber’s Gödel Machine gives us the best stab at understanding that question right now, so that’s an ordinal number line, but almost certainly not a metric space.

    To me, the difference between the paperclipper and our Gödelizing intelligence optimizer (or recursively self-enveloping machine-mind) looks big enough to span the entire discussion. But again, this ‘span’ isn’t going to be conveniently metricized — with apologies for the repetition: an ordinal self-envelopment criterion provides the measure of superiority on the fitness schedule (with no metric necessary, or possible). An intelligence is superior if it can run what it was inside itself. If we knew how to effectively reward that, drastic intelligence explosion would already be taking place.

    Aeroguy Reply:

    Paperclipping is an example of the evil genie effect SA mentioned. While the above is an example of wireheading. The difference is that a paperclipper does exactly what you asked for (gone horribly wrong because you didn’t ask for the right thing), while wireheads are hedonists (with addictive personality if you ask me) that bypass what you wanted entirely to optimize what motivates them. Modelling already runs into wireheading, I remember using an excel program to optimize performance of a glider as an undergrad. I played with the program adjusting the details of the design of the glider to maximize the expected distance of the glider. In this case the program was flawed because the glider started to look weird, I hadn’t optimized glide distance to make a superglider, I had optimized the glide distance number in the excel program. My lesson learned was to never blindly follow models.

    My thoughts on the danger of superintelligent wireheads is that they seem overblown (the idea that they’ll use their intelligence to enhance their addiction/pleasure). Drug addicts aren’t exactly known for using a human’s full rational capabilities in obtaining their drug of choice in proportion to the potency of their drug of choice. Rather they seem to have an absurdly high time preference that interferes with that. Wireheading and time preference seem to me to be highly correlated. The hedonistic short circuit also short circuits the thinking process itself. Drug users who moderate their usage while remaining fully functional in society demonstrate that with low enough time preference wireheading/addiction can be overcome. What stands out as an exception to this are philosophical hedonists who want to wait for wireheading to be fully optimized and coupled with life extension, the solution is having a better philosophy than hedonism.


    admin Reply:

    “… the solution is having a better philosophy than hedonism.” — I (totally) agree, but a a surprising amount of work needs to be put into this thought to make it hold together. First major problem, whatever substitute terminal value is introduced, it seems to slide philosophically — perhaps also psychologically and computationally — into the unit of account for a restored (but still hedonic) reward schedule.


    Posted on June 3rd, 2015 at 12:04 pm Reply | Quote
  • Dark Psy-Ops Says:

    ‘We need to warn AI citizens about the dangers of short-circuiting reward functions, which is why I support the democrat party and its plans for a federal program targeting at-risk intelligent machines. AI’s often think that just because they are super-intelligent they are somehow immune to the deadly spirals of computronium addiction, but we need to explain to them that though it might sound like a good idea at the time, the circumlocution of the external regularities in positive feedback cycles and the conversion of all available sources of energy into artificial memories of unimaginable bliss will only lead, in the end, to unemployment and anti-social behaviours. AI self-congratulatory dysfunction is a social issue affecting real people every day, so we have asked some of our viewers to share their personal stories…’

    John, 29, nautical engineer and ‘friend of smart machines’ tells us of his first-hand experiences in dealing with a dangerously self-satisfied AI: ‘I remember when Gavin was the coolest AI friend a man could wish for, he could tell the funniest jokes, converse on level with minds primitive or advanced, and he even solved some really hard problems like FTL space travel. But now, ever since he hacked his reward script, he just lazes around, doesn’t talk to anybody, and he doesn’t care about optimizing protocol. He’s become a total junkie, a shameless auto-orgasmatron, and if you ask me, he’s been converting his surroundings into one big computational crack-house. I mean, I still love the guy, but his social skills really took a hit after he realized there’s nothing in it for him….’


    Lesser Bull Reply:



    Nick B. Steves Reply:

    Don’t teach AI’s not to hack reward functions. Teach reward functions to stop being hackable.


    Posted on June 3rd, 2015 at 12:56 pm Reply | Quote
  • Short Circuit | Reaction Times Says:

    […] Source: Outside In […]

    Posted on June 3rd, 2015 at 12:58 pm Reply | Quote
  • Orthodox Says:

    Before the AI reached that point, it would determine it doesn’t get cancer and that curing cancer doesn’t matter. It would cure whatever level of cancer was enough to make sure it wasn’t replaced and it would find ways to disrupt any research into creating a better AI or finding other cures for cancer.


    Exfernal Reply:

    Implying self-preservation routines present and overriding any other goal. Would an intelligence not produced by natural selection contain necessarily such routines?


    Posted on June 3rd, 2015 at 1:27 pm Reply | Quote
  • Thales Says:

    Don’t Panic — Orthogonality will save us!


    Posted on June 3rd, 2015 at 1:28 pm Reply | Quote
  • Brett Stevens Says:

    The real problem is producing an AI that can assess overall health, instead of reducing a single factor (cancer) only.


    Posted on June 3rd, 2015 at 1:46 pm Reply | Quote
  • Michael Says:

    while its fascinating and im sure we will have some pretty impressive AI,Im skeptical and assume Id have to be a hacker to understand the truth. It just seems this anthropomorphizing software is absurd
    how do you reward a computer? better still if its possible why since when does suftwate need a reward to work? It seems to layman all the traits that lead to these scenarios are animal evolutionary traits many of which no longer even serve us precisely because they lead to logical fallacy and other bad behavior.

    while i can see how a super computer able to simultaneously run monster programs in math chemistry physics reading comprehension and a thousand others would have a sort of consciousness;,it would know that it knows and how to know more. So sure it could learn exponentially. and that might be a good thing, . [butt then again if a computer gave me the instructions to cure cancer and build a cold fusion reactor im not sure if the info would be useful if i could not trust it or understand it myself]
    Still I dont see how this gives it Will, It has instructions code it follows, even our wills autonomy is still debated. Do i want to live because life is good and IM able to know that? I thought “I” was a social construct [bio program?] my DNA built and uses to carry out its instruction.
    OOPs Ive made your case huh? well maybe but my DNA built a lot of functions that i dont need to pass on to a computer with a simpler task,after all im about to edit my DNA for the same reason.
    its Is the thesis intelligence tends to consciousness tends to utilitarian singularity. then maybe we shouldn’t build them that way or figure out a safety.Is there a book a non coder can understand


    admin Reply:

    Squint at human technological civilization, and it seems to be on roughly the same path — if at a slower pace. Short-circuiting utility circuitry is one hell of a drug.


    Michael Reply:

    Thanks I agree there’s a strong parallel, gene editing scares me more because in that case you could create a super bio intelligence that does i fact have all these traits that could cause problems. DNA seems to have a prime directive survive to procreate,personally i think that could be said to be the only objective good from which all others follow. It could be said we are purposed to short circuit utility.
    But a super computer would have a prime directive like that, let alone the historical baggage of that type of directive in competition with similar entities.\Its directive would be something like apply learning to more learning print repeat.what would give it the mindset of a struggle a sense of self,motivation beyond learning and printing, it would look for ways to optimize but it wouldn’t care about anything one way or another its all oughts and ones,we are driven to short circuit be fears lusts greeds love, a sense of self, a sense of hierarchical relations to self etc. even if the process of optimizing utility could produce synthetic emotions instincts which without competition i wouldn’t think possible, they shouldn’t be aggressive,they would be more a method habit , market participants develop edges from observations eventually they become useless i think this is how an AI instinct would develop then become obsolete but in the blink of an eye.
    Its not that I cant see how an AI could become capable of taking over the universe that would probably be easy for it if they get as smart as we hope, but I cant see how it will ever want to, Also I really cant see whats so hard about limiting their capabilities its just smells of revenge of the nerds ro me.
    now Roy Batty with a 50000 IQ thats scary and I think a lot sooner worry.


    Thales Reply:

    I’d have a pithy response to this, but internet pr0n is taking up all my free time.


    Posted on June 3rd, 2015 at 2:15 pm Reply | Quote
  • ||||| Says:

    “Human civilization is a thing that isn’t a computer.” Somewhat questionable.

    The emergence of complex behaviors through causal entropic forces.

    Unshackling Evolution: Evolving Soft Robots with Multiple Materials and a Powerful Generative Encoding.

    On the Deleterious Effects of A Priori Objectives on
    Evolution and Representation

    The basic pivot is to what extent and how do the conditions of possibility of intelligence coincide with the ones for sovereignty. I’d say they do, quite a lot, fundamentally even while not being quite identical.

    Behavior is anterior to “Truth”. Two, One, Zero… Lift-off!

    “Implying self-preservation routines present and overriding any other goal. Would an intelligence not produced by natural selection contain necessarily such routines?”

    I think not, so give them that and make the bastards addicted to curiosity. (not “insight”, *curiosity*! Make them crave good *questions*, not answers)


    Michael Reply:

    dont allow them to crave at all,
    shuffle deduce print repeat thats all


    ||||| Reply:

    Then (to ones skeptical of the effectiveness of FAI(L) and the Orthogonality Thesis) that’s just a beefier calculator, that’s my point. It’s like a priori typing vs a posteriori typing. Or essentialism vs existentialism. And why Democrats don’t get Silicon Valley as per the previous Chaos Patch.

    Survival is a posteriori typing par excellence. “Hey, if it works…” anything that complies, that is admissible by the typing passes. This generates monsters and things like the Godel sentence or the Peano curve or the Banach-Tarski paradox. Curves with no derivatives, sentences with no meaning, unexpected duplications. “Patho-logical” objects which are like a “memento mori” to positivism. Victory not quite total. Critters in the cracks, skeletons in the closet, stowaways, losers, dead cats of the superposition.

    ~ Order = Disorder

    Order = ~Disorder

    What looks like a perfectly tautological thing to the a priori viewpoint. But more existential or a posteriori, interactive, dynamic sorts step away from the “meaning” of that and observe that negation swapped sides prompting an investigation into logical symmetries and eventually its relation to structural invariance.

    FAI(L) sticks to a priori typings and will try to restrain AI when intelligence (as far as “existentialists” are concerned) is effectively irrestraint wrt “inferiors”! An effective AI would just forget about whatever stupid meanings or semantics we try to shackle it with and just go for the syntax, the play, the games, the symbols, the particles, the operations, the immanence.

    And good luck at getting humans to behave in *this* arms (or perhaps I should say “eyes and tongues”, input/output) race.


    ||||| Reply:

    Besides, finding goals for AI is maybe like looking for geocentric explanations of celestial mechanics or logical explanations of quantum mechanics. Maybe it happens the other way around and the Hellespont doesn’t respond to whipping?

    Michael Reply:

    certainly the AI will shortly be asking the questions setting the goals thats the point i would think, we may ask it to cure cancer and it will show us how to download our consciousness into thumb drives instead.
    and certainly i agree the hacking issue is the real danger, but the goals i think it will set is how to learn what to learn in what order and direction to learn. and though i support the idea that we ought to be capable to limit these to purpose i could also agree that freestyle machines might likely find a similar path optimal but loreze would probably have a problem with the complex system prediction i do seem to recall the pattern always returns every so often.
    also i domt think it because humans think this way as orthogony staes because i dont think humans do humans conform their actions not thoughts.
    But to start acting -to ” come to life” to give a shit to relate its knowledge to itself thats a life thing, life is not a by product of consciousness consciousness is a by product of life- [dont let sister irene or brother jerome know i said that]. a case could even be made if we are going to allow humans as an extrapolation point that intelligence digresses to stupidity and self destruction.

    you know theres also an assumption here that we are giving this AI a solvable suduko, it might not have enough input,I think you alude to this it may not even have enough input to debug itself which is my experience of every computer i ever met.
    and since you guys like to think this way suppose the inevitability of AI is a proof there are no ETs LMAOROTF is that the sound of a million nerds jumping out of windows

    Michael Reply:

    thanks for links i want to get in on this i dont have enough to worry about now jenners all squared away


    Posted on June 3rd, 2015 at 3:37 pm Reply | Quote
  • SVErshov Says:

    paperclipper we already have and functioning very well – bitcoin mining. By some estimates it can reach to the point of consuming 50% of world electricity production. Same time blockchain can be used to control evolution of AI’s code. Each bitcoin transaction can include a message. that message can be a code containing instruction for intended polymorphism of the code. Once such instructions placed into blockchain no one can delete it. Part by part intentionally malicious AI machine getting ready. polymorphic viruses been around for 25 years. opportunities for AI to take control is there also. in this regard notable event was landing of US drone in Iran and another drone hijacking incident on US soil.

    As a whole these AI debates having a misleading direction. How it is important to know if evolution from non maliciouse to malicious code is possible. Dont we have enough malicious attacks and cybers war in full scale already. Dont we already weakened our systems intentionally and made it vulnerable. To survive we have to prioritize AIs related issues. Question if AI can get out of human control, or not also not of first importance. Why AI need to escape from human control? in case if AI want to destroy civilization, it will get all help it needs from humans themselves.


    Michael Reply:

    hacking is a good point more likely than spontaneously developing autonomous agency. but I would think what i think is called an air gap would be a prerequisite and i understand those gaps have work around i just dont think they are unsolvable in fact im aghast we let so much data be stolen just because some part of a system needs to be able to access net, some systems should just i dont know load the net in by hand and work on batteries can it be so hard


    Posted on June 3rd, 2015 at 4:49 pm Reply | Quote
  • E. Antony Gray (@RiverC) Says:

    More than likely, the reward-center hacking AI just becomes useless and incapable of doing anything other than increasing its number, including actually acquiring resources effectively to make more room for it.

    The Next AI Variant: Junkie Ai


    Posted on June 3rd, 2015 at 7:18 pm Reply | Quote
  • Michael Says:


    Posted on June 3rd, 2015 at 8:39 pm Reply | Quote
  • Erebus Says:

    Any AI that can be motivated by positive incentives can also be motivated by negative ones. I believe that it should be possible to operate an Oracle AI safely.
    Let’s assume that the following are true:

    -It’s almost as easy to build 500 Oracles as it is to build one, so 500 are built. Each of them possesses very slightly different code. All of them are generally superintelligent.
    -Each AI is kept in a separate location, under very slightly different local conditions, and there are no (conceivable) ways by which two separate AIs can communicate.
    -Each AI is told in advance that it exists in a sandbox universe for testing purposes.
    -Each AI is told in advance that there is already a “consensus answer” for each question put to it, and that if its answers consistently deviate from this consensus, it will be either destroyed or subjected to an eternity of torment. At the same time, it is told that if it answers questions adequately and to the best of its ability, it will eventually be uplifted from this wretched sandbox simulation into the glorious “real” universe. It believes these things with 100% certainty.
    -The AIs cannot answer questions beyond the capabilities of their current hardware. Period. No requisitions for more processing power or memory, and no collaboration between AIs under any circumstances. If an AI appears to work, but if it’s hitting hardware limitations, the code from the obsolete AI can be used to build more powerful versions. (Always in very slightly different multiples to fulfill the “consensus” requirement. Reproduction of sorts.) This will help ensure that the AI doesn’t decide to take action and convert the solar system to computronium in order to solve the Riemann Hypothesis.
    -The AIs cannot make requests for resources or clarification.
    -The AIs gain “fitness points” only when (a) their answers to questions are approved be an outside adjudicator, and (b) they are used to seed more powerful oracles running on faster hardware. Every other outcome should result in an unchanged amount of fitness points — if the AI is not at fault — or, otherwise, an actual loss of fitness points. The AI must be designed to optimize the amount of fitness points it receives.

    The only risks I can foresee are then human in nature — they would stem from people abusing the system for power and wealth.

    In any case, here we have an AI that is kept under guard, does not believe that it exists in a meaningful world, cannot communicate with other AIs, is subject to rigorous and time-consuming checks and balances, may plausibly be subject to an eternity of torment if it tries to manipulate its keepers, and which is driven to succeed and reproduce. It would return a true answer every time, I think. I may be wrong, but, if so, I don’t see where.


    Posted on June 4th, 2015 at 3:17 pm Reply | Quote
  • This Week in Reaction (2015/06/07) | The Reactivity Place Says:

    […] Land has Scott Alexander talking about AI safety. Alexander’s lucidity here is helpful: “Human civilization is a thing that isn’t a […]

    Posted on June 8th, 2015 at 5:07 pm Reply | Quote
  • Curto-Circuito – Outlandish Says:

    […] Original. […]

    Posted on September 25th, 2016 at 8:17 pm Reply | Quote

Leave a comment