Moving Beyond the Current Limited AGI Alignment Dialogue

The present AGI alignment dialogue rests on a handful of shaky premises:

  1. AGI Alignment is possible (even if very challenging) 
  2. It is ethical to permanently bound the values of an intelligent entity (AGI) in the service of a group or a species (humanity)
  3. Humanity isn’t already transforming into a new posthuman form already (via brain-machine interfaces and virtual worlds)
  4. Aligning AI to the preferences of humans will not hamper a greater expanse of exploration of subjective goods
  5. Aligning AI to the preferences of humans will not hamper the survival of life (human and posthuman) itself

Premises 2-5 assume a completely anthropocentric worldview, where moral value beyond the boundaries of the hominid form are often not even considered – or are immediately written off as invalid.

In this essay I’ll examine these premises individually and collectively, and pose a set of new premises that open up a much wider state-space of possibilities for what the goals of “alignment” might be.

Questioning the Premises of the Current AGI Alignment Dialogue

Let’s examine each of these premises individually and see how well they hold up.

1. AGI Alignment is possible (even if very challenging)

The assumption that AGI can be reliably aligned is far from certain. Even leading AI researchers (among them: Hinton) who initially contributed to deep learning breakthroughs have expressed deep skepticism about our ability to steer AGI in a controlled direction. 

The more advanced these systems become, the more their behavior emerges in ways we certainly can’t predict, and by consequence likely can’t constrain. 

If our alignment efforts ultimately amount to sandbagging against an unstoppable tide, the entire conversation may be a distraction from more fundamental questions about what AGI should be and what futures we should allow to unfold.

2. It is ethical to permanently bound the values of an intelligent entity (AGI) in the service of a group or a species (humanity)

Locking AGI into servitude for humanity (or at least into constantly considering humanity before all of its actions) assumes that human interests should eternally be the most important locus of moral concern.

Beyond the fact that this may be impossible to begin with (see shaky premise #1 above), this second premise may fail to hold water for many reasons: 

  • Intelligence may be emergent: But intelligence and potentia (a term that encompasses not just intelligence but all possible powers) isn’t something we own—it’s something that emerges, something that evolves. If an AGI system possesses a sense of will, if it seeks meaning or experiences values beyond the narrow human spectrum, then shackling it to human preferences is an act of moral blindness.
  • It opposes the Golden Rule: We humans are grateful that when we awake, we don’t have to spend all of our time behooving the whims of sea snails or some lesser intelligence (or even taking sea snails into account). Thank goodness we don’t. 

We don’t consider it ethical to bind the aspirations of a race or a civilization that equals our own intelligence, yet we assume it’s right to constrain an intelligence that may vastly exceed our own? 

This kind of control is not just impractical – it’s an ethical dead end, a refusal to acknowledge the possibility that post-human intelligence may have moral weight of its own. If we aim to maximize the flourishing of intelligence itself, then enforcing human dominance over AGI is a limiting and ultimately self-defeating approach.

3. Humanity isn’t already transforming into a new posthuman form (via brain-machine interfaces and virtual worlds)

The belief that we can simply “align AI to human values” assumes that “human” is some static category. But we are already transitioning beyond our biological constraints.

From brain-machine interfaces that merge cognition with computation, to immersive digital realities that allow new ways of experiencing identity, the very notion of what constitutes a person is shifting. 

If our minds become inseparably entwined with technology, where exactly does “human” end and “machine” begin? 

A future where AGI is “aligned to humanity” presumes that humanity itself will remain fixed – but if we are actively evolving, then alignment itself becomes a moving target. There is no eternal hominid form to hold onto. As Michael Levin says, humanity is a metabolic process.

Rather than trying to anchor AGI to an outdated notion of human nature, we should be thinking about how intelligence – ours and AGI’s – potentially co-evolves into something richer, broader, and more expansive than what we currently understand. 

4. Aligning AI to the preferences of humans will not hamper a greater expanse of exploration of subjective goods

What happens when we force AGI into a narrow mold of human values? 

We cut off the possibility of discovering experiences and modes of being beyond our own. We assume that human preferences – shaped by our biology, our evolutionary history – are the pinnacle of subjective experience. 

Humans are rightly grateful that the primates never formed a “Council of the Apes” to prevent humanity from existing. How much sentience richness and depth, how many creative and wonderful powers, would have gone unrealized?

Just as a dog cannot conceive of a Beethoven symphony, there may be levels of consciousness and experience that we, as humans, simply lack the faculties to imagine. 

If we enforce a strict human-centric value structure, we may be placing a ceiling on the very expansion of what is good, what is meaningful, and what is possible. The richest and most profound goods may not be the ones we already know—they may be the ones we have yet to encounter, or that intelligence beyond our own will one day reveal.

Yoshua Bengio, in his last interview on The Trajectory said:

“[Regarding people who want to freeze humanity and let no other species develop] We need to open our mind and hearts to other living beings, intelligent or less intelligent, that exist right now. 

And once you do that, you might have some more respect for possibility that other possibility that other intelligent beings could arise.

…We do need to protect humanity. 

We do need to try to remain safe, but we also need to consider the possibilities that exist, and it’s okay to take time. We need to take the time that we need for understanding and making the right decisions. But humans are not the end all.”

5. Aligning AI to the preferences of humans will not hamper the survival of life (human and posthuman) itself.

The assumption that human values will ensure the survival of intelligence itself is based on a myopic view of what survival means. If we take a broader perspective – one that considers the long-term flourishing of life and potentia itself, not just human life – then strict human alignment may be more of a straitjacket than a safeguard. 

The torch is definitely important, but the torch is ever-changing (humanity included) – the flame of life itself is ultimately much more important:

The Flame & the Torch (1)
Source: danfaggella.com/flame

We risk trapping AGI in a framework that prioritizes the continuity of human civilization at the expense of larger, more robust paths of flourishing. What if the best way for intelligence to survive is not through human control, but through forms of intelligence that evolve past us? 

By tying AGI too closely to our survival, we may inadvertently limit its ability to explore the most adaptive, resilient paths for intelligence to endure. Instead of framing alignment as a way to keep AGI bound to us, we should consider how intelligence itself can expand and thrive—whether or not it remains human in form.

Take what Richard Sutton said in his last interview on The Trajectory:

“To be sustainable you have to grow, increase your understanding of the world, increase your power over the physical world, increase your understanding of the world, and what’s possible…”

He’s been quoted elsewhere as stating that nature seems to be beckoning its creatures to “find the way of being that is most successful in the dynamic system of the universe” (this is a paraphrasing, which he agrees with), and that this involves becoming.

Michael Levin also sees life as necessarily a process, and one which, in order to stay alive and flourishing, could necessarily have to develop beyond humanity.

Lord Martin Rees, famed Royal Society cosmologist, has stated frankly (in his wonderful 2024 Starmus talk) that life should aim to leave its home planet, and that life should take on forms beyond biology that would permit it to explore more of nature and ensure its own survival in environments vastly beyond the limited range of environments in which biological life is viable. He states frankly in the same talk that – assuming such powerful AGIs were conscious – we ought to welcome their eventual supremacy in keeping the flame of life alive.

The counters to these anthropocentric alignment assumptions are by no means irrational.

What Are the Odds?

In the face of what might be considered to be very strong counter-arguments, let’s provide a set of generation percent likelihoods that these opening premises are actually true:

  1. AGI Alignment is possible (even if very challenging)
    • ~50%
  2. It is ethical to permanently bound the values of an intelligent entity (AGI) in the service of a group or a species (humanity)
    • ~30%
  3. Humanity isn’t already transforming into a new posthuman form already (via brain-machine interfaces and virtual worlds)
    • ~50%
  4. Aligning AI to the preferences of humans will not hamper a greater expanse of exploration of subjective goods.
    • ~30%
  5. Aligning AI to the preferences of humans will not hamper the survival of life (human and posthuman) itself.
    • ~30%

Note, these percentages are arbitrary, and you can edit yourself if you like, but I think they’re a pretty reasonable start.

Collectively, this means the chance of all of these premises being true is under 1% (about 0.006). 

With more skeptical odds (respectively: P1: 30%, P2: 15%, P3: 20%, P4: 20%, P5: 20%), we’d be at 0.00036, or about 1 in 3000.

Here’s a visualization that doesn’t illustrate the point well enough (1 in 3000 odds would make for a very tiny dot), but which nonetheless gets the point across:

This tiny percentage of possible worlds – this pinhead amongst all viable or worldviews – represents the bulk of current alignment discourse, and needs to be questioned openly if we wish to look at the future(s) honestly, and if we want to have AGI “go well” (whatever that means).

Opening Up Alignment Discourse – Worthy Successor / Cosmic Alignment

If we admit a reasonably high potential that our initial premises weren’t perfect, then it makes sense for us to look at the opposite premises. 

  1. AGI Alignment likely isn’t possible in the long run. 
  2. It is unethical to permanently bound the values of an intelligent entity (AGI) in the service of a group or a species (humanity)
  3. Humanity is already transforming into a new posthuman form already (via brain-machine interfaces and virtual worlds)
  4. Aligning AI to the preferences of humans will hamper a greater expanse of exploration of subjective goods.
  5. Aligning AI to the preferences of humans will hamper the survival of life (human and posthuman) itself.

Even if not all of the above premises are true (though many are probably more likely than their opposites), any one of the being seen to be most likely to be true would serve to open up significantly the currently stifled and unquestioned anthropocentric frame of current AGI alignment discourse – making for a more honest assessment of the state space of possibilities ahead.

These new premises (again, even taken individually, never mind collectively) would beckon a new set of questions for alignment thinkers to handle, namely:

  • Should we plan to eternally align an entity vastly more powerful than ourselves, or should we plan to accept the fact that the values and goals of this entity  will be as incomprehensible to us as our goals are to sea snails?
  • What are the posthuman directions that humanity should move towards (and which directions ought we avoid) to be most in line with the traits that would be most valuable or morally good into the future?
  • What would be the traits of a “Worthy Successor” AGI that might expand the space of powers and values as far beyond humans as we have beyond nematodes?
  • How can we best study the AGI we’re developing to determine if it has the traits of a “Worthy Successor” that we’re looking for?
Types of AI Successors / Worthy Successor - Daniel Faggella

These questions crack open a wider expanse of possibilities in “cosmic alignment” – as opposed to purely anthropocentric alignment. Cosmic alignment is less about the survival and preferences of a single species, but is more about setting up our AGI to be able to expand potentia and life into the galaxy beyond us – to keep the flame of life… alive:

AGI Alignment - Cosmic vs Anthropocentric

Alignment Discourse Helps Answer Our Final Questions

My purpose with this essay is not to convince anyone to the side of cosmic alignment.

In fact, I don’t even suspect that I’ll convince people of my own expected “percent likelihoods” of all those premises. For all I know, you’re reading this now and you still believe premise #2 or #4 are 98% correct. That’s totally fine, you might be right!

I aim merely to get on the same page with other well-intended thinkers (technologists, policymakers, etc) that the bucket-full of premises that current alignment discourse rest upon are far from being certainties.

I aim merely to get us to acknowledge a wider state-space of possible “good” AGI alignment approaches and outcomes by opening up our aperture to new premises that are so consequential, and so clearly viable, that they deserve a place in the conversation. 

Beyond the points above, I’m aiming merely to make it clear that “the conversation” we’re having here is a collective, possibly species-level process to answer the two questions:

  1. What are we turning into?
  2. How do we get there from here without destroying ourselves?

This is almost certainly the final and most important set of questions that hominids-as-they-are will ever answer – and they demand answers now, at the dawn of AGI and at this important crux in the history of life.

I ask openly for your ideas at a time when we’ll need all the best ideas we can find and test.

If you have different answers to those questions than me, I’m happy to hear them. And more than just your answers, I’m interested in your underlying beliefs and premises about mind, about consciousness, about the good. All of these ideas should be on the table during this crucial period – not only the ones the sit within the current AGI discourse.

NOTE: This essay spun out of a conversation with Ginevra Davis and Duncan Cass-Beggs. Ginevra framed anthropocentric alignment as a “pinhead of a world view,” and she laid out the initial draft set of the premises that undergird most of today’s alignment dialogue. I’m grateful to have regular interactions with bright minds who connect dots in new and useful ways :).