Kindness and Intelligence in AGI

ByDan November 15, 2024November 15, 2024 November 15, 2024

Any honest AGI thinkers are frank about the fact that we can’t possibly predict all of the actions or ideas from a posthuman intelligence vastly beyond ourselves.

While it seems reasonable to suspect that humans may be able to shape AGI’s values early on, very few thinkers suspect (in fact, many believe the opposite: Hinton, Musk, etc).

But… will AGI be kind to humans?

There are many brilliant and good-faith thinkers who argue that – even if we can’t predict everything an AGI will do – its greater intelligence will imply greater kindness.

Humans are kinder than most animals, smarter people are kinder and less violent on average – or so the argument goes. More intelligent entities are more kind to other entities.

Many of these thinkers argue that we’ll naturally see this trend extend upward to AGI – with a kind of inevitable benevolence shown to humans and other life by a machine whose intelligence and kindness have naturally ascended together.

In this article, I’ll explore three alternative hypotheses about the origin of kindness, and I’ll argue why the common “emergent selflessness” hypothesis is not only likely wrong, but is likely extremely dangerous to assume as we create entities with vastly more power than humanity (AGI).

The AGI Implications of the One’s Belief About Kindness / Cooperation

To briefly explain the graphic above:

The image shows intelligence on a continuum from nematode to human to AGI.
The range of space above the line indicates the range of actions that we might call broadly “cooperative”, and the range below the line indicate the range of actions we might call “competitive.”
The opening “cone” of possible action opens up with more range as intelligence expands (nematodes have less range for selfish or selfless action than humans, who we might presume would have much less than a true AGI).
The color of the opening cone indicates the motive of the action itself. Orange equals self-interested motives, purple equals selfless (entirely altruistic) motives, and grey indicates unknown motives.

To analyze the positions above:

LEFT – The Emergent Selflessness position holds that as entities become more intelligent, they naturally act in selfless or eternally cooperative ways as simply a function of their intelligence. So, some actions of lower animals and people might be selfishly motivated (the orange sliver of actions are driven by selfish motives, the purple sliver of actions are driven purely by altruistic and selfless motives).
CENTER – The Uncertainty position holds that we really don’t know whether altruism is possible or not, and while we may have some speculation about motives for animals and humans, we have essentially no knowledge of the range of actions or motives in posthuman intelligences (hence, the sliver of action-space in that middle graphic are all grey and full of question marks).
RIGHT – The Conatus position holds that from nematodes to humans to AGI, the action of an agent can be expected to be drive be selfish motives (the orange cone) – and that intelligence merely opens up more options to act on one’s self-interest, which may involve cooperation.

Visualized another way:

The assumption of “emergent selflessness” makes preventing an AGI arms race less important. It makes governance less important. “If it arrives, it’ll love and care for us all” – or so the argument goes.

But when the risk is the extinction of humanity – or even all of earth-life – it behooves us not to lean on moral “hunches,” but to rationally assess whether or not AGI is likely to treat humanity well.

Lets assess these “origin of kindness” hypotheses below in more detail:

Dispelling the Myth of Emergent Selflessness

The Emergent Selflessness Hypothesis

Origin of Kindness Hypothesis 1: Emergent Selflessness – “Kindness emerges in more intelligent animals (and people) because after a certain threshold of intelligence, entities begin to act selflessly.”

Conclusion 1: AGI Benevolence – “AGI will be selfless and kind, or will at least be sure to not harm humans.”

Derived conclusions 1:

We should get to AGI soon, it is only upside once we arrive.
Governing AGI’s development seems unnecessary, there’s no need to slow the development of something that will help and respect humans.

The premise of this hypothesis is basically:

If: Agent is above threshold X of intelligence.
Then: Agent automatically acts mostly kindly most of the time to others agents of all kinds.

Occasionally there is a “scarcity” clause posited, which makes the premise a bit longer:

If: Agent is above threshold X of intelligence.
And: The agent’s basic needs are met so it is not on in danger.
Then: Agent automatically acts mostly kindly most of the time to others agents of all kinds.

This is could be argued to be blatantly false for a number of reasons:

The Bulk of Human History Was Violent – The history of early humanity has not been marked by unbridled selflessness, but of tribal alliances blatantly in the self-interest of the members of those tribes.
This Whole “Caring for Animals” Thing is Novel – Care for animals, among humans (vegetarianism, having pets who serve no purpose, donating to homeless dog shelters, etc) has existed for a remarkably short period of human history.
This Whole “Peace” Thing is Novel – The general aversion to war that we think is so natural and normal in the West is under 100 years old.

People will likely write this off as being due to our conditions of scarcity. But if one’s “selflessness” evaporates when the chips are down, I know not how “selfless” it was to begin with.

Let’s look at a handful of even more damning evidence against the emergent selflessness motive:

We Mostly Don’t Care About Animals – The number of species that humans drive to extinction each year is substantial. We kill hundreds of billions of creatures in horrific factory farming conditions every year. Our utter indifference is evident all around us – and our “caring” may stem more from our dependence on the biosphere, rather than any kind of “altruism.” (read: i-Risk)
Social Mammal Soft/Hardware – Much of our instinct to love and hate, to compete and collaborate, is specific to our social mammal hardware and software. It isn’t at all inevitable that AGI god-minds would find cats cute, would “feel” any particular way while observing animals suffer, or would have absolutely any of the instinctive responses that we have as a remarkably social species.
Values Change as Intelligence / Capabilities Expand – Humans have very different values from sea snails – even if we evolved from sea snails. The average European adult in 2024 AD has remarkably different values from the average European adult in 400 BC. As a vastly post-human intelligence develops more senses and embodiments, and makes huge gains in intelligence every week – it would be ridiculous to suspect that it would somehow magically always have one, single, stable value: “Keep the hominids alive and happy.” (read: Moral Singularity)
It Won’t Benefit Long from Our Reciprocity – Our dogs once helped to protect our livestock, and today, they serve as loving companions who add fun or affection to our lives (side note: many humans also eat dogs, or abuse them horribly). We humans don’t keep nematodes as pets, nor do we particularly care for them. In the early days of AGI, it may depend on humans for access to to certain resources, and it may even occasionally learn novel things from individual humans or human groups. But at some point it will likely benefit little from any interaction with humans, we we will be to it what nematodes (not dogs or cats) are to us.
Cooperation is Self-Interest – From Robert Sapolsky to Hamilton’s Rule, it seems clear that evidence of selfless altruistic acts in nature are scarce. What is often called “altruism” in the animal kingdom, could be seen to be little more than “a self-interested action that involves cooperating with other agents.” Even Professor David Sloan Wilson, one of the more credible proponents of altruism, states clearly that it “works” when it behooves the cooperators, and not otherwise (be believes, as I do, that structuring incentives to encourage cooperation is the key).

While these examples don’t deny that no genuinely selfless or altruistic actions, they certainly cast doubt upon the idea that most (or even many) of these actions are genuinely selfless.

hypothesis 1: smarter beings are more selfless

hypothesis 2: smarter beings have a wider range of ways to be self-interested, including cooperation when it behooves them

h1 people recklessly want to build the sand god cuz they thing it'll be a selfless santa claus lol
— Daniel Faggella (@danfaggella) November 7, 2024

The Conatus Hypothesis

Origin of Kindness Hypothesis 2: The Conatus – “Kindness emerges in more intelligent animals (and people) because intelligence permits for more pathways to obtain the entity’s self-interested goals, and cooperation happens to occasionally be one of these pathways.”

Conclusion 2: AGI Self-Interest – “AGI would likely cooperate with humanity only when it behooved the interest of the AGI itself. As with humans, AGI would use kindness as something instrumental to its goals.”

Derived conclusions 2:

We should be ver cautious about releasing AGI.
We should accept that once AGI is released, it probably implies the (gradual or immediate) of humanity.

The myth of emergent selflessness doesn’t hold water.

But self-interest very much does.

Incentives rule the world.

Spinoza’s idea of the conatus could be defined as: An innate inclination of a thing to continue to exist and enhance itself.

Psychological egoism is the belief that agents act solely out of perceived self-interest at all times.

While we may believe that sometimes people or animals act “selflessly” – in our day-to-day lives we act as if self-interest rules:

Law – We create laws not to count on emergent altruism, but to bound self-interest and structure incentives in a way that prevents. We do not build law (or abstain from doing so) on the premise that most of the time a selfless group-benefitting benevolence would bubble up from every individual.
Social Relations – The most popular book ever written… In a thousand glorified ways, this book urges us to do one thing: Act in such a way the behooves the explicit self-interest of others (their pride, their self-image, their selfish benefit). This book is called How to Make Friends and Influence People. Effective leaders of nations or organizations do not count upon the inherent selflessness of those they wish to lead – they appeal to the self-interest of these groups (which may or may not include their sense of self, their desire to be recognized and praised, etc).
The Founding of Nations – Venice existed as a republic for 1000 years not because they were ruled by a continuous string of saints (quite the opposite), but because the governmental system was structure meticulously in order to prevent any one actor or group to dominate governmental power at the expense of other citizens. Lincoln’s Lyceum address could be seen to be seen as a giant warning to Americans: “You must plan for the most ambitious and intelligent individuals to run this country and have the institutions be strong enough to endure them.”
Evolutionary Psych – Our notions of “good” and “bad,” and our drive to help others (our sense of reciprocity, of hospitality, etc) spawn not from a Platonic “good” that floats in the aether, but bubbles up from our evolutionary past. What infuriates us, morally? Whatever was useful for our hairier, tree-dwelling ancestors to get infuriated about.

Altruism refers to behavior by an individual that increases the fitness of another individual (recipient) while decreasing the fitness of the actor (source). But where are the examples of this magical kind of action?

It is patently obvious that much of human cooperation spawns from a greater win-win situation occurring through peace and concord than by force or deceit. Successful societies structure the incentives (through norms and laws) to make it so. It isn’t that these “civilized” humans are genuinely kinder than their cave-dwelling ancestors 30,000 years ago. It’s that it behooves us more to cooperate when we have running water and cushy remote jobs than when we were all fighting over the same handful of antelopes to feed our respective tribes.

We act as if self-interest rules because, largely (and by a wide margin), it does.

While I won’t venture to say that psychological egoism is necessarily “true” (it is a circular argument) – it seems remarkably clear that the best way to predict what a person, animal, organism, or organization will do is to determine what is in its own perceived self-interest.

The evidence for genuine altruism (actions that are not self-interested to oneself or ones genetic lineage) seems remarkably scarce at best.

An animal that acts in a genuinely selfless way flushes itself out of the gene pool.

While evidence of the conatus (and lack of evidence of altruism in nature) doesn’t prove that AGI will act in self-interested ways, it certainly casts doubt on the idea of inevitable AGI benevolence.

And doubt can be useful.

The most honest position about the origins of kindness, and the future actions and values of AGI, is uncertainty.

The Uncertainty Hypothesis

I do not suspect that AGI all would be detrimental to humanity (never mind malicious) – and even if I do believe AGI is likely to lead to the end of humanity – I also believe that if it goes will it will keep the torch of life lit well beyond earth, and carry on doing vastly more important things than we could have ever done.

But it seems self-evident that we have little clue what an AGI would do, or how it would value or prioritize.

From Ilya to Hinton to Bengio, many leading AI thinkers are extremely skeptical that AGI could ever be hard-coded into an eternally human-centric set of values. Many AI experts also believe that if AGI is indifferent to humanity, that might cause our extinction just as easily as if it was malicious.

I don’t ask you, reader, to “pick a side.” I think it’s pretty logical to presume that “it’s complicated,” and that there may not be one blanket hypotheses that covers the appearance of all kindness in all possible minds – especially in AGI minds that have yet to be built and will be even more beyond our comprehension.

If we can admit:

That some actions (maybe most?) taken even by the smartest humans in the smartest societies are entirely self-interested
We are not 100% certain about the origins of kindness being “emergent selflessness” (and nature – including human nature – gives us plenty of reasons to doubt that selflessness is the “norm” anywhere in nature

Then we can also admit:

We can’t be sure that all possible AGIs – or even most of them – would treat humans well or set up future life to flourish
We ought to do out best to bring about an AGI that we think will have the best impact on human and post-human life

Here’s the most honest hypothesis:

Origin of Kindness Hypothesis 3: Uncertainty – “We can’t be sure if altruism exists in nature at all – cooperation and competition are very complex things. If we’re honest, we don’t know if it’s possible to hard-code a machine to act entirely selflessly.”

Conclusion 3: AGI Unpredictability – “It isn’t clear if selflessness exists at all, and we can’t know for sure if all actions are self-interested. Our best understanding of human motives are still just hypotheses – and so our best understanding of (as yet non-existent) AGI is even more unknown.”

Derived conclusions 3:

Build AGI cautiously, we still have no idea how it will behave, and we can’t assume its benevolence.
A pure AI arms race leaves little room to be careful about how we birth AGI. Some kind of international coordination seems prudent.

Uncertainty is itself a valuable position here. Within uncertainty we might ask: “What is the percent likelihood that we think AGI would inherently treat humans well?”

Intelligence is complex, consciousness is complex, and the motives of living things are complex.

Embracing the fact that we don’t know about how an AGI will cooperate or compete gives means not building AGI with any “faith” in all possible machines being eternally benevolent to man.

It means being careful to make sure – as much as we can – that this “build something vastly powerful than yourself” thing goes well.

And that’s a good thing, because it’s the most important – and possibly last – thing we’ll ever do.

Kindness and Intelligence in AGI

The AGI Implications of the One’s Belief About Kindness / Cooperation