Green Eggs and Ham – Facing Future Technology and AI Like an Adult
If you were a child at any point in the last 50 years, you’re probably familiar with the story. Sam I Am tries to get a protagonist (let’s call him…
People who fear AGI destroying humanity often fear that AGI will not share human values.
People who advocate for AGI soon often believe that AGI will naturally share human values.
But what are values?
As it turns out, your beliefs about what values are have very serious implications for your perception of AGI’s risks and possibilities.
In this short essay I’ll contrast two hypotheses about what values are, and why I believe it’s important to be skeptical of a very common view of values as a kind of editable “module” separate from an entities drive to survive.
If I don’t convince you do my own pet theory (which I’m not certain about), I at least aim to cast doubt on a popular, and detrimental view of what values are.
Oxford Dictionary defines values as: A person’s principles or standards of behavior; one’s judgment of what is important in life.
Some people hold that values are a kind of inner detection system disconnected from the drive to survive, aimed at finding true “right” and “wrong.”
But this theory seems flatly wrong for many reasons:
Values should be see frankly for what they are:
An extension of the drive to survive (conatus). A way to expand a conscious agent’s ability to engage with a dynamic environment.
Values seem to be a kind of potentia (the total set of powers an agent can wield in order to survive – which can include anything from camouflage, a hard shell, the ability to fly, verbal communication, etc) for some conscious animals, and especially for conscious and social ones.
In conscious, social animals, it is important to know what will please or displease others, as this knowledge allows one to take actions that are more likely to achieve your desired aim. Eating all the food you find may enrage others, so giving others a bit of food allows for you to eat as much as you can (without backlash from others).
If a human being living in Boston or Tokyo had the “values” of a field mouse, they wouldn’t just be “different” from those around them, they’d be totally unprepared to live and thrive in their environment. Values adjust as intelligence increases, and they adjust as environments change, resources change, etc. The people of Tokyo today have different values than the people in the same location 2,000 years earlier – as it should be.
Here are a handful of what I consider to be the most dangerous downsides of heading into an AGI or posthuman future with the flawed “Magic Moral Module” idea in mind:
Embracing the conatus theory of values implies the exact opposite of the points I listed in the previous section of this essay, namely:
I don’t 100% believe the Conatus theory of values is true. But I do consider it to be very likely, and I think that having a healthy skepticism away from the “Magic Moral Module” perspective is very important as we move into an era of potentially posthuman intelligences.
/end of main essay/
This essay partially springs from recent conversations on The Trajectory with Aaronson (2024) and Yudkowsky (not yet published as of Dec 2024).
This essay isn’t a permanent claim about being correct, or calling their positions wrong (I’m just a neuron here), but merely to highlight what I consider, philosophically, to be the crux of why I disagree with them. The may be right.
(Note: I’m not claiming that either of these thinkers fits into the most naive form of the “magic moral module” theory, but I’m using their ideas as examples.)
Aaronson seemed to think that there may be a kind of moral “bedrock” upon which human and all posthuman intelligences may arrive. He posits that it isn’t unreasonable to suspect that the golden rule is a kind of “2 + 2 = 4” of morality for all intelligent agents. I hope my essay above expresses why I think this is likely to be incorrect. There may be knowledge that stays the same as an AGI’s mind expands, and there may be some ways of acting that an AGI would hold onto for as long as they’re useful, but I don’t see “values” as existing outside the skulls (imaginations) of social mammals. And even if they could be (i.e. if we could lock in “values” eternally into an AGI) I suspect this may be overtly morally wrong.
Yudkowsky seems to suspect that values such as “fun” (he has a unique and interesting definition of this term), and “caring” should be somehow eternally preserved. He has a theory about how super-enhanced uploaded humans might eventually “lock in” said values (i.e. said “moral module”) into an AGI that might then go off and populate the galaxy. I think that “caring” and “fun” are fine and lovely values, and not ones we should forget to immediately abandon, but for all the reasons above in this essay, I think it’s unlikely that they could be ossified, and I think we should not ossify them in the long term.
A quick quote from the Concord Sage, that great articulator of what is inaccessible to man, but still is or should become:
“Our life is an apprenticeship to the truth, that around every circle another can be drawn; that there is no end in nature, but every end is a beginning; that there is always another dawn risen on mid-noon, and under every deep a lower deep opens.
.
This fact, as far as it symbolizes the moral fact of the Unattainable, the flying Perfect Note, around which the hands of man can never meet, at once the inspirer and the condemner of every success, may conveniently serve us to connect many illustrations of human power in every department.” – Emerson, Circles.
It may be that Scott and the AGI superman would both arrive at the same “flying Perfect” notion of morality.
It is possible that the values espoused by YUD (that’s what they [affectionately?] call him on X, anyway… I sure hope that’s not how they abbreviate my name) is the right kind of eternal “flying Perfect” to which we should hitch to all AGIs as they joyously populate the galaxy.
But I suspect we are in apprenticeship to the truth, and that claiming confidence in what we have hominid-mind access to isn’t the path we should take.
If you were a child at any point in the last 50 years, you’re probably familiar with the story. Sam I Am tries to get a protagonist (let’s call him…
Episode 1 of The Trajectory is with none other than the Turing Award winner, MILA Scientific Director, ML godfather, and (very recently) AI safety advocate, Yoshua Bengio. My first interview…
If you claim to be fighting for positive AGI outcomes and you’re calling Altman “selfish” or “dangerous” or “evil,” you’re part of the problem. Here’s the TL;DR on this article:…
If you must kill me, reader, do it without malice or distain. I’m just a man hurled into a world without any inherent meaning, born to die – like you….
I don’t take my ideas all that seriously. Or yours, frankly, though I hope to learn from them. You and I both are morons and neurons. Morons: We each individually…
The 5 Stages of Grief Model isn’t perfect – but it also seems to overlap well with the experiences of grieving people. The psychological model itself needn’t be perfect –…