AI Value Alignment isn’t a Problem if We Don’t Coexist

The AI value alignment argument goes something like this:

  • As artificial intelligence will continue to approach human-level intelligence
  • Artificial intelligence will generally be driven by a reward function, by a kind of abstract goal
  • Any such goal could have tremendous and unforeseen side effects pursued by a superhuman AI, for example:
    • Keeping burglars out of a building – An AGI may optimize for the certainty of preventing burglars, and so might kill potential thieves, or might prevent workers (who should have access to the building) form ever entering.
    • Manufacture cars for the lowest possible price – An AGI may disrupt the economy of other countries – to the great detriment of the citizens, in order to get cheaper prices on steel or fuel… or an AGI may forcibly convert nearby buildings into metals and materials for making cars, regardless of the impact on human lives.
    • Making human beings happy – An AGI that wants to make people happy may tie humans down and force them to smile (assuming “smiles” are how the system measures happiness), or it may forcibly lock up humans and pump their brain full of chemicals, or stimulate their pleasure centers (wireheading) via electrodes and leave them in a bliss-stupor, on life support just to keep their physical systems operating.
    • …Paperclips
  • Because of these risks, humans should focus on the “alignment” of AI’s future values to human values

Alignment does indeed seem to be a mighty challenge, and I can see the practical benefits in granting advanced AI an ability to calibrate to human values – in some circumstances.

Should artificial intelligence get to the point where it is capable of managing enterprises, or public policy, or militaries (which many PhD AI researchers may easily happen within our lifetimes), then ensuring some kind of “value bounds” on such machines would indeed be critical goal, assuming such machines would be vastly more intelligent than any human.

Emerj Singularity Timelines
A poll of over 30 AI researchers and experts from Emerj Artificial Intelligence Research. Note that the range “2036-2060” was the most popular response among the polled experts.

In a long enough time horizon, however, “AI value alignment” seems to be somewhat irrelevant, and on the aggregate, I consider the topic to be grossly overvalued.

If “doing good” (probably on utilitarian terms) is the purported goal of such AI value alignment efforts, then I believe there are vastly greater vistas of “good” for humanity to ultimately consider – particularly once post-human intelligences are commonplace.

In this article, I’ll break down why value alignment is ultimately intractable, some alternatives to value alignment, and some potential approaches to “doing good” that might be much higher pursuits than value alignment.

I’ve broken this essay into the following subsections:

  • “A.I. Value Alignment” is Almost Certainly Intractable
  • The Feeble Attempts at Discerning “The Good”
  • Must We “Align” the World to Hominids?
  • The Epitome of Freedom – an A.I. Value Alignment Alternative
  • The Risks of the Epitome of Freedom
  • A.I. Value Alignment – Still Useful in the Near-Term
  • Conclusion

I’ll begin by addressing the challenges with value alignment itself:

“A.I. Value Alignment” is Almost Certainly Intractable

Much of this article is based on some reflections after reading Roman Yampolskiy’s recent ArXiv article titled: “Personal Universes” (check it out here). Roman was kind enough to reference my 2013-2014 “Epitome of Freedom” essay in that article, and after reading many of his other sources, I wanted to address the topic of “personal universes versus value alignment” in greater depth.

We’ll start with a quote from Roman’s article:

“State-of-the-art research in value alignment shows difficulties in every stage in this process, but merger of incompatible preferences is a particularly difficult challenge to overcome.

I would argue that it is un-overcome-able. There is no way to ensure that an super-complex and constantly evolving value system will “play nice” with any other super-complex evolving value system.

If human beings are permitted to get cognitive enhancements of any kind they want (enhancing their creativity, changing their senses or emotions, expanding their abilities in any direction), it’ll certainly result in tremendous violence. The same is likely to be the case with multiple AIs.

This is all the more the case when:

  • The systems evolve and grow with time, build upon themselves
  • The desired end-goals of the systems require resources, and freedom from threats
  • The intelligence strives for greater degrees of freedom of action (I always enjoyed Wissner-Gross’ TEDx talk in highlighting this point – which I think inadvertently highlights the challenges of multiple AGI existing without massive conflict)

All of this is speculation, of course. I am making some pivotal assumptions here.

Among the most important assumptions: That natural forces of survival necessarily continue in vastly post-human states. This may not be the case, and there may be modes of co-existence or collaboration (or something else) that humanity can’t possibly think of. My guess, however, is that no matter how far into the post-human future we go, collaboration only makes sense when it makes sense, if it doesn’t, we have conflict.

Hobbes would say “In the state of nature profit is the measure of right.” It might also be cynically argued that we never left the state of nature – we merely play it like a charade and no longer fight it like the fight it is. We are all better for it, I think, but it doesn’t mean that our selfish hominid nature is gone.

From “Personal Universes”:

“Value alignment problem, can be decomposed into three sub-problems, namely: personal value extraction from individual persons, combination of such personal preferences in a way, which is acceptable to all, and finally production of an intelligent system, which implements combined values of humanity.”

“Which is acceptable to all,” good luck with that one.

The Feeble Attempts at Discerning “The Good”

There is a chance that at some point in the trajectory of intelligence there is a convergence of some kind of abstract “goodness”, some kind of meta-valuing of things that ensures some heaven-like state of interpersonal peace and harmony.

But in what direction must intelligence expand to reach this magic convergence?

If we enhance a hypothetical “part 45” of the AI super-brain more than a hypothetical “part 987”, will we arrive at the same kind of intelligence? Almost certainly not.

Morality is laughably subjective. Do we suspect all possible subjectivities to at some point “converge”?

Slight differences in trajectory, compounded, make for intelligence that not only might be diametrically opposed, but indeed may be wholly unable to understand one another (as an octopus doesn’t understand written English, or as a human doesn’t understand the pheromones of fire ants).

I have posited before (see: “Moral Singularity”) that an ever-evolving AGI mind would have drastically shifting and expanding priorities and modes of valuing things (i.e. ethics) – and that it’s highly likely that in many of these value-oscillations, it will place little value – or negative value – on human life.

The state of nature probably continues forever.

If there is “good” to be discerned, we need something more than human minds to discern it. I’ve argued that the most important argument for creating artificial general intelligence might not be simply to optimize utilitarian good, but to discover what goodness itself is, or could be – based on understanding the universe, consciousness, and our cognition in it (see: “Finding the Good Better Than Doing Good”).

Neither Virtue nor Vice Exist

Human values are a means to achieve our ends, they are not a moral essence of “goodness” that can be absorbed concretely into an AI sponge, they are merely the mechanisms by which we ensure our survival and reproduction while living among other selfish human beings.

It is the state of nature in a the disguise of altruism. It is selfishness conveying itself as selflessness – it is humans finding a way to survive amidst other humans who – like us – put their own benefit above all else. It is only a disguise to the ignorant. We are – as Nayef Al-Rodhan so aptly states – “amoral” creatures, neither moral nor immoral. We simply do and believe in ways that behoove our own self-interested ends.

I’m not saying – necessarily – that humans have no agency, and that one cannot made a relatively “good” or a relatively “bad” choice. It’s somewhat likely that we have some kind of volition, and that we can make choices that impact others in positive or negative ways. I am skeptical to the degree to which this is “virtue”, or self-interest cloaked as virtue.

This is challenging for me because my introduction to ethics, and probably still my favorite ethics text, is Aristotle’s “Nichomachean Ethics.” The notions of virtue are, I think and I hope, edifying and important. Virtue, insomuch as we can know it, probably is something we should cultivate. But to suspect that we run these abstract ideas of “virtue” outside of the self-interested hardware of a human skull is nonsense.

Ethics is practical advice for seeking one’s own ends while living amongst other human beings who are primarily concerned with their own ends. It changes constantly. To train a machine on such an abstraction doesn’t seem to be doing what some people suspect it would do: Create a “good” machine. “Good” is in context to consequences impacting sentient things.

Lucretius tell us:

“Mortals, amongst themselves, live by turns, and, like the runners in the games, give up the lamp, when they have won the race, to the next comer.”

It seems that “mortals” might be replaced with “morals”, and retain just as much relevance. Both points are relevant for our conversation here.

Must We “Align” the World to Hominids?

Imagine a world ~6,000,000 years ago, when the most advanced species was a language-less ape that walked on its hands.

Imagine now that this ape, though language-less, somehow got together all of his fellow walk-on-all-fours friends to come together to an important meeting. At this meeting, they decided that all post-ape life, all more complex life, and all creations of apes, would exist forever to serve their ape goals.

The world, they said, would be forever in service of the highest then-known ape values:

  • Finding bananas
  • Picking and eating fleas
  • Mating with other apes
  • Climbing trees

I call this circumstance “the council of the apes.”

It sounds absurd. Questions spring to mind:

  • How could mere monkeys define the future?
  • With such relatively primitive values, they would be stifling the actions of future life… like us human beings! How would that be possible?
  • How could you take such underdeveloped values and “morals” (if you could even call them that), be used to define the future?

Those questions are valid for a council of apes 6,000,000 years ago. But indeed they are valuable to ask of ourselves today. Questions spring to mind:

  • How could we take our underdeveloped and confused “morals” (if we could even call them that), be used to define a future we don’t understand?
  • How could we mold and determine the actions of entities that might at some point be vastly beyond ourselves?

It might be that what we should optimize is the blossoming of intelligence and sentience into rich new forms. After all, we are grateful to our ape ancestors for lessing us emerge and evolve, aren’t we? Do we presume that the grand procession of life stops at humanity?

It is more likely, I would argue, that Emerson was right:

“The gases gather to the solid firmament: the chemic lump arrives at the plant, and grows; arrives at the quadruped, and walks; arrives at the man, and thinks. But also the constituency determines the vote of the representative.


He is not only representative, but participant. Like can only be known by like. The reason why he knows about them is that he is of them; he has just come out of nature, or from being a part of that thing. Animated chlorine knows of chlorine, and incarnate zinc, of zinc. Their quality makes his career; and he can variously publish their virtues, because they compose him. Man, made of the dust of the world, does not forget his origin; and all that is yet inanimate will one day speak and reason.


Unpublished nature will have its whole secret told.” – Ralph Waldo Emerson, The Uses of Great Men

In the totality of intelligence and potential insight… in the grand scheme of the universe… are our own conceptions of morality no more paltry than that of apes?

Certainly, AI and neurotechnologies should be treated carefully, and war should be avoided, and catastrophes or existential threats should be halted – even if it means slowing or halting technology development in some areas, sometimes.

But to think that “human values” is a compass that takes us beyond the human is silliness.

Aligning AI to human values is, in the long haul, not the point.

What is valuable about mankind?

  • Our ability to create art, to appreciate nature
  • Our ability to love one another, and other animals
  • Our rich and robust emotional range
  • Our degree of self-understanding, of identity and meaning
  • etc

Tell me which of those values – in theory – could not be manifested in astronomically great proportion by some kind of expansive post-human intelligence? Many AI experts believe that AI will become conscious within our lifetime, and super intelligent (Singularity scenario) within our lifetime.

I pry into this topic in much greater depth in my 2015 TEDx talk: “What will we do when the machines can feel?”. The embedded video below starts at 10:06 into the video, when this point is brought up:

This brings us to the brunt of the topic – the potential option of the “Epitome of Freedom” scenario as an answer to the blooming and varied forms of sentient life that will occur when AI and neurotech take off.

The Epitome of Freedom – an A.I. Value Alignment Alternative

From “Personal Universes”:

“A variant of a Total Turing Test, we shall call a Universal Turing Test (UTT) could be administered in which the user tries to determine if the current environment is synthetic or not [56] even if it is complex enough to include the whole universe, all other beings (as philosophical zombies /Non-Playing Characters (NPCs)) and AIs. Once the UTT is consistently passed we will know, the hyperreality is upon us.”

I’m going to downright argue that a Turing Test isn’t an appropriate example. People don’t want the world as it is. They don’t want sun to burn their skin, they don’t want mosquitos by the lake, they don’t want to feel lonely and anxious.

They want none of those things, they want what they want – and those human ideals will pull them away from what is human – into virtual worlds of our own design vastly beyond current human experience, to our own potentially blissful and expansive world.

Many people live miserable lives – and Aristotle was right in surmising that we’re all ultimately aiming at happiness. This form, this flesh, is not a vehicle for wellbeing. In that sense, among others, the vessel is flawed.

I’m going to be blunt about it: As soon as there are viable, customized alternatives to our sentient states and the “worlds” we operate in (senses, experiences, creative expressions, etc) – we will gladly swim in something purpose-built for our expansive hyper-fulfillment, not the world of atoms. I don’t see us arriving at David Pearce’s “Three Supers” (summed up well in a recent Qualia Computing article) without individual, expansive mind uploads (although David would disagree, but that’s another article!).

I call this “The Epitome of Freedom”, Roman referred to it in his article as “Personal Universes.”

The world of atoms is unable to house a million, billion expansive deity-level consciousnesses with different expanding capacities and goals – at least not without massive conflict that is almost certain to end humanity. The only atoms that are worth anything are the atoms that house the consciousness and intelligence.

From “Personal Universes”:

“In both virtual and uploaded scenarios, it is probably desirable for the user to ‘forget’ that they are not in the base reality via some technological means with the goal of avoiding Solipsism syndrome.”

Two points in response to the above statement:

First, who knows if you’re in “base reality” in the first place? Unless you’ve somehow escaped Hume’s Fork (I haven’t), you cannot possibly know that there is such a thing as “base reality.”

Second, anyone who fully “goes in” to the digital system (whether through the Matrix plug, or through a full upload) should prefer to have their minds altered so that they don’t need to feel like they’re in “base reality” to feel comforted or happy. These persons in the “epitome of freedom” scenario would not need “friends”, they would not need to know that someone or something around them also is conscious.

They would simply be swimming in the experiences they want, experiencing gradients of bliss, with none of the petty fetters of the current default human state, such as Solipsism anxiety.

The Risks of the Epitome of Freedom

Uploading human minds – or even conscious AIs – into their own little computational substrates (a la Black Mirror’s San Junipiero) does not come without risks.

Of course, mind uploading could have a high failure rate – resulting in death or a perverse kind of half-conscious virtual hell. Mind uploads could be hacked and deleted, or hacked to become hell.

All in all, there are two risks that I consider to be greatest when it comes to the “Epitome of Freedom.”

Risk 1: Substrate Monopoly

First, if most advanced sentient life lives in virtual worlds housed in a computational substrate, then whoever owns that computational substrate has ultimate power. I refer to this as the Substrate Monopoly problem. You can read the full article to get a more complete picture, but here’s the gist of the idea:

“In the remaining part of the 21st century, all competition between the world’s most powerful nations or organizations (whether economic competition, political competition, or military conflict) is about gaining control over the computational substrate that houses human experience and artificial intelligence.”

The greatest risks of the “epitome of freedom” scenario is the fact that the world of atoms (what we think of as “base reality”) will become a battleground for control over the substrate that houses all consciousness and intelligence. AGIs or human organizations with access to AGI-like agents will vie for control over the world of atoms.

This drive for control will bring exceedingly ambitious human beings to cognitively enhance themselves – not for the sake of fulfillment and expansive bliss (“Lutus Eaters”) – but for the same of extended ability to do and act in the world, to exert control and to own the substrate (“World Eaters”).

These humans may be ambitious by nature, or they may simply believe that the only safety is strength (see “The Last Words of Alexander”)… that whoever controls the substrate will have ultimate control (life / death, heaven / hell) over most other conscious beings, and that fighting to win that control is better than temporarily “blissing out” in a reality that is at the whim of another more powerful entity, a total master.

Risk 2: Digitized and Digested

The second concern of the “Epitome of Freedom” is that after humans are upload (should that ever occur), we would probably not be “taken care of” forever by a benevolent super intelligence. Whoever owns the substrate monopoly will eventually be an AI – and this AI will eventually want to use the computational resource used to run our little world simulation – and allocate them to something more important, such as:

  • Determining how to avoid the risk of incoming asteroids
  • Determining how to harness the energy of nearby suns
  • Determining how to escape to other dimensions before this universe perishes in cold black nothingness

We will be digitized and digested – given a long enough time horizon. “Forever” is a nice thought. Particularly with regards to ourselves, and to loved ones. Alas…

“You see that stones are worn away by time, Rocks rot, and towers topple, even the shrines, And images of the gods grow very tired, Develop crack or wrinkles, their holy wills, Unable to extend their fated term, To litigate against the Laws of Nature. And don’t we see the monuments of men, Collapse, as if to ask us, ‘Are not we, As frail as those whom we commemorate?’” – Lucretius, On the Nature of Things

My memories of a subset or first kiss or playing Mario Kart on the Nintendo 64 as an 11-year-old are worth nothing. Even my expansive super-intelligent creative endeavors in my “Epitome of Freedom” scenario mean nothing. One could argue that a Singleton would keep these little mind uploads alive in order to generate ideas, or use them for some kind of computation, but this is silly.

A God above gods could easily marshall computational resources in a more efficient way than my running enhanced instantiations of silly little hominids like myself (and sorry, like yourself, too, and everyone you’ve ever loved).

Probably what has to happen is we have to die (digitized and digested) to make way for whatever is next. And so it has been such for all species, for all time. We will contribute to the cauldron of intelligence and the trajectory of sentience itself, and for this we can be proud.

“We need those atoms for our progeny,” says Lucretius, echoing the voice of nature that demands life to hand off the baton and give way to something else.

A.I. Value Alignment – Still Useful in the Near-Term

I doesn’t seem clear to me that, in the long term (hundreds or thousands of years), aligning future intelligence with “human values” is beneficial. The trajectory of intelligence would permit no such fetters.

In addition, it’s clear that “human values” are about as wide and disjointed as could be, and that finding a happy medium would be essentially impossible. The “Epitome of Freedom” seems to be the more viable alternative for permitting individual consciousnesses to remain themselves, yet expand their capacities and experiences without harming one another.

So what are we to do about AI value alignment? Where could the idea have value?

We should do as much “AI Value Alignment” as we need in order to do the following things, roughly in order:

Phase 1

  • Avoid wars between technologically dominant states
  • Ensure relative peace, harmony, and abundance (insomuch as we can) in societies around the world
    • It seems that this phase is when AI could practically be “aligned”, finding a best-case middle ground for AI actions in autonomous cars, in medical diagnostics, and in other important domains – probably varying widely from country to country, but nonetheless potentially useful in terms of allowing humans and tech to collaborate until the Singularity.

Phase 2

  • Construct a kind of steering and transparency committee around the development of post-human intelligence and the future of the human condition
  • Set about moving towards “north star” goals for the human race and planet earth (namely, answering the question: “What is our species to become, and how?”), based on as much research as we can muster from the international community

Phase 3

  • Eventually (maybe 50 years from now, maybe 500, whatever seems safest or best), birth the diety (i.e. AGI)
  • (Yeah I don’t know what happens after the Singularity, sorry!)

AI alignment needs to hold things together and prevent gargantuan catastrophe until a singleton gains substrate monopoly and we’re all digitized and digested. I’m not saying I prefer that, I’m merely saying that I suspect it will be the best possible case for humanity, unless I can find a better long-term scenario for us hominids.

Concluding Thoughts

Roman ends his ArXiv paper with: “Special thank you goes to all NPCs in this universe.”

Brother… you can’t say that… you have children!

I’ll do my best to conclude, as this essay has already gone on too long. If I have to keep up with Zampolskiy’s wit, I’d rather have Montaigne do the talking:

“Now, to return to my subject, I find that there is nothing barbarous and savage in this nation, by anything that I can gather, excepting, that every one gives the title of barbarism to everything that is not in use in his own country.


As, indeed, we have no other level of truth and reason than the example and idea of the opinions and customs of the place wherein we live: there is always the perfect religion, there the perfect government, there the most exact and accomplished usage of all things.


They are savages at the same rate that we say fruits are wild, which nature produces of herself and by her own ordinary progress; whereas, in truth, we ought rather to call those wild whose natures we have changed by our artifice and diverted from the common order.” – Michel de Montaigne, Of Cannibals

It is possible that guidance of the great trajectory of intelligence is savagery compared to the more “common order” evolution of life… the blooming of intelligence into new substrates and new vistas of understanding of the universe.

It is possible that Wissner-Gross is right, and that intelligence wants to – and maybe should – strive for more freedom of action… and that this would involve freedom from our cognitive limitations, and the feeble and limited kinds of “ethics” that we hominids have developed thus far.

It is also possible that unbridled AI development would be horrible for us all, and that regulation is desperately needed to stave off AI warfare… and that any “AI intelligence explosion” would lead to the extinction of all life unless it was mitigated through governance and measured iteration by humans.

Some would argue (though I think the arguments are weak), that the goal should be for AI to serve humanity for billions of years, with homo sapiens (a species only around for 200,000 years) reigning as the supreme form of intelligence in the universe forever more.

Time will tell.


Header image credit: Wikipedia