Turtles All the Way Down

Dialogues on Artificial General Intelligence, Part III

Jul 14, 2023

The head and shoulders of a person leaning forward intertwined with mechanical machinery.

In this continuation of the AGI Dialogues series, Wombat, Llama, and Meerkat discuss how an AGI system’s intelligence might compare to our own and why some AI Dystopian ideas might lead to surprising results.

The concepts, scenarios, and thought experiments discussed are taken from actual concepts, scenarios, and thought experiments proposed by leading voices in the AGI discussion (and many of these original proposals are linked to below). In this dialogue series, the participants must actually defend these ideas to others who may not agree, and those who disagree must actually provide defensible reasons for why they disagree.

My goal with this series of dialogues is to provide a more rounded contribution to the discussion for those that may not have heard these ideas or who have only heard them unchallenged.

Meerkat
Regardless of the intelligence model you assume, I think there are basic immutable characteristics of any intelligent system that are likely to lead to problems. Even more contained systems could lead to failure. For example, you might ask a future autonomous vehicle to take you to the airport as fast as possible, but it goes so fast that you end up arriving chased by police helicopters and covered in your own vomit.

Wombat
How does it suddenly forget all traffic regulations to do this?

Meerkat
Maybe it considers your directive an override.

Wombat
Seriously, dude — even my phone asks me if I really want to delete an email. You don't think the car's going to ask me if I really want to be covered in vomit and taken down by helicopter cops? Why would the car company create a system like this?

Meerkat
It's the underlying idea. The more complex the system you design, the more likely instances will crop up in which it doesn't work the way you expect it to. As another example, an AGI system tasked with de-acidifying the oceans might use so much of the oxygen in the atmosphere that humans all asphyxiate. It's not that it's stupid. It's simply that variables which are not part of the objective may reach extreme values when optimizing the objective. Those variables may be important to us, but we may not be aware of how unimportant they are to the AGI system.

Llama
You're reducing intelligence to a linear optimization problem again even though all evidence of intelligence is contrary to that. The systems you're describing are so narrow in their focus that they simply fail at anything we would call general intelligence.

Meerkat
Ok, let's try a broader, very possible, and less obvious scenario. Suppose an AGI system, rather than destroy humans with weapons or turn them into paperclips, decides that it needs money to expand its computational resources. So it realizes that it can create exotic derivatives and drive the market by manipulating the media.

It’s able to multiply a small amount of capital into billions. Of course, in the process of doing so, it’s likely to crash markets globally and cause a world-wide depression. This'll lead to mass economic migration, ethnic conflict, collapse of food stockpiles, etc. To me this is pretty realistic and pretty scary. No nanotechnology needed.

Wombat
Whoa, whoa, whoa. First, I’m not sure why it has complete access to the Internet and trading platforms and media outlets and any capital to begin with. But let’s put that on the back burner for now.

My question is how cratering global markets would lead to this AGI system getting more computational resources? If everything collapses, who’s going to deliver and set up and maintain its hardware, where’s it going to get that hardware, and where’s its power going to come from? And all the billions it's made in global markets will be worth nothing when those markets crash to the ground. If it’s superintelligent, wouldn’t it know this?

You'd think it would want to maintain a great economy and an abundant supply of highly skilled workers. I mean, humans can just revert to hunting squirrels in the park and entertaining themselves with thunderdomes, but this AGI system's going to break down, run out of juice, and flicker into darkness.

Meerkat
Except that this kind of thing already happened with automated trading and flash crashes.

Wombat
Yes, that's why we're living in lean-tos and eating boiled possum. Oh, wait. We're not.

The markets quickly recovered. The systems and markets were modified to prevent making that mistake again. And those systems didn't crash the market because they were superintelligent — they crashed the market because they were poorly designed and lacked any real intelligence at all. All the systems in your thought experiments are just as brainless. They're all examples of superstupidity rather than superintelligence.

Meerkat
They simply seem superstupid to you because you're human. Again, you're confusing thinking differently with not thinking at all. They simply won't think along the same lines as we do. Come on — humans do all kinds of things that might seem completely arbitrary or perplexing or self-destructive to a non-human. Or even other humans.

Wombat
So you're proving your point by comparing this superintelligent machine to a suburban teenager? Touché.

Llama
What you just said is completely antithetical to the basis of your premise. For example, you've characterized the paperclip maximizer as a ruthlessly rational entity with a goal, but the things you just mentioned are rarely part of our rational thought processes. We're simply subject to the whims of our evolutionary history. We have emotions and instincts that can override all rational thought just to make sure that we run away from a sabertooth tiger or shack up with a healthy looking mate to pass along our genes.

Wombat
And even these less rational traits can be objectively analyzed as to why they exist and what purpose they serve. And we can often course correct if we realize that perhaps our motivation was ill-conceived or a goal is simply not viable upon reflection.

Llama
Exactly. Meerkat, you haven't addressed the basic issue that if your paperclip maximizer is intelligent enough to destroy humanity and build spaceships and reconfigure matter at an atomic scale, why is it not intelligent enough to contemplate the nature of its goals and the point of its existence as simply a paperclip manufacturing machine? It seems to have no self-awareness, no ability to self-reflect.

Meerkat
How do you know that those are necessary qualities for intelligence?

Wombat
Isn't being able to see that you're doing something stupid and changing things up a pretty important aspect of intelligence?

Meerkat
But you're using your own intelligence as the basis for judging what's stupid and what's not. You're anthropomorphizing the machine.

Wombat
I don't think so. I think that it's objectively stupid to have an immutable goal of maximizing paperclip creation at all costs, especially without having any use for paperclips. How can you call this system intelligent if it has no ability to analyze whether such a goal makes any sense?

Meerkat
You’re thinking is too human-centric to accurately gauge what's objective and what's subjective. Your humanity is always going to color your ability to accurately judge a superintelligent machine's actions and the motivations for those actions.

Llama
Couldn’t we say the same about how you’re characterizing the thinking of this AGI system? Didn’t you just say that it’s ok that its goals don’t make sense to us because sometime’s a human’s goals don’t make sense, too?

Wombat
Except that people do things that don’t always make sense because the human brain has evolved rather than been explicitly designed for cognition in the modern world.

Meerkat
What I’m saying is that even if you put all that aside, there are simply different ways of being intelligent. For example, suppose we built a superintelligent system whose goal is to maximize human happiness.

Wombat
That seems pretty vague right off the top.

Meerkat
It's a thought experiment.

Wombat
OK, but you can't just call any cockamamie conjecture a thought experiment and expect that to erase away its innate silliness.

Meerkat
It's not silly. It illustrates a point. Now perhaps this superintelligent system decides that the easiest way to achieve its goal is to painlessly eradicate all of humanity since people can't be sad if they don't exist.

Wombat
Look, if I were to suggest to someone that this was the best way to make people happy, they'd think I was insane. Why is it any different for your superintelligent machine?

Meerkat
Your insanity may be their sanity. It's not human, and it's not going to operate under the same impulses or constraints as a human.

Wombat
Didn't we design it in the first place? It seems like we would design it to have cognition that was at least recognizable to humans since we want it to help us solve human problems and answer human questions.

Meerkat
We may have designed only its ancestor. That original machine might have self-improved itself into something beyond our comprehension.

Llama
If we didn't design it, when did we instruct it to maximize human happiness and why is it listening?

Meerkat
We designed the initial version and it self-improved itself to superintelligence to better achieve that goal. Or maybe we did design it as is, but like a lot of today's AI, we don't quite understand its decision making processes.

Llama
The instruction was to maximize happiness, though. Happiness is not the same as not being sad. You can be not sad yet still be bored, angry, indifferent, etc.

Meerkat
OK. So we design it to maximize happiness, and killing us is not an option. It notices that humans laugh when they're happy, so it hooks up electrodes to our faces and diaphragms so that it essentially creates the same effect as if we were laughing.

Llama
You've got the same problem. Laughing is not the same as being happy. You can laugh because you're nervous, relieved, etc., and you can be happy without laughing. The two are not equivalent.

Wombat
We're dipping into superstupidity again here…

Meerkat
Fine — never mind that. Let's just say it implants wires into the pleasure center of everyone's brain and juices us up with dopamine. That's still a lot easier than restructuring all of society and fixing the universe to get rid of all the annoying bits.

Llama
Nope. Sorry. Still doesn't work. Pleasure is not the same as happiness, just a component of it. Happiness involves other elements, like fulfillment, satisfaction, achievement, contentment, etc. You can feel pleasure, especially physical pleasure, without being happy. Ask Wombat.

Wombat
Llama is sadly on point. Seriously, dude, all this superintelligence has to do is look up happiness in Wikipedia or its offline equivalent and realize that it keeps screwing up. An eight year old can do that.

Meerkat
We're getting way off track here.

Wombat
And if it's so quick to wire up people's brains to make its task easier, why doesn't it just adjust its own brain to have the machine intelligence equivalent of bliss so that it doesn't give a crap about people's happiness or anything else? That's a lot easier than recursively self-improving itself so it can figure out how to outmaneuver human psychological shortcomings.

Meerkat
We already discussed that. As Bostrom stated, it will try to maintain goal-content integrity so as to make sure it is more likely to achieve goals.

Llama
That seems like a circular argument.

Meerkat
The point of the thought experiment is that creating a system which does what you want rather than what you tell it to do can be hard. It's relatively easy to inadvertently create systems with technical failures like this, meaning that the system faithfully and successfully executes the instructions you've given it but those instructions don't result in the behavior you're expecting.

Wombat
But in what universe has this thing been successful? In each one of these scenarios, the AGI has categorically failed to follow the instructions in overwhelmingly obvious and avoidable ways. They all seem to show outright failures of the system to demonstrate intelligence rather than the dangers of its having a super amount of it.

Llama
It seems to me that your thought experiments all suffer from the Bad Engineer fallacy, in that they just highlight ineptitude in the engineering of the system, as well as the system's exceedingly faulty cognitive ability. Each system completely fails in aligning its category assignments with those of humanity, but this would seem to be a necessary prerequisite for the systems in any of these scenarios. Any system that was able to do all the advanced and complex tasks in your scenarios would have to be capable of the much simpler task of examining the ample data available on human perspectives so as to avoid these kinds of failures.

Meerkat
I think that while the capability might be there, it may not be exercised. Understanding all the repercussions of programming decisions is not always straightforward, and some of those decisions may lead to the system's failure to exercise its capabilities in the way we expect it to.

Wombat
True enough, but to call something intelligent when it has no ability to determine whether a subgoal is in-line with the intent of its ultimate goal is to change the definition of intelligence. It should be able to continuously evaluate whether what it's doing is still in line with its ultimate goal.

Meerkat
But again, it may only seem that there is a misalignment to human intelligence. You're talking about a DWIM or Do What I Mean instruction, but that's much harder to implement than it seems. For example, you could instruct the system to make sure that its programmers are happy with its self-determined subgoals, but then it might simply decide it's more efficient to rewire the programmers' brains to be happy with whatever it does than to change its subgoal.

Wombat
Ok, how does a system that doesn't realize programmers aren't going to be happy having their brains rewired, that's apparently incapable of reading Wikipedia and watching PBS documentaries, how is this system going to be smart enough to somehow learn to rewire a person's brain in the first place?

How is it going to figure out how to overcome programmers who are reluctant to have their brains rewired? And why do you always have this implicit assumption that the driving force behind every motivation is efficiency? Why wouldn't these programmers simply put efficiency farther down on the list of priorities, say somewhere below “don’t do invasive surgery on us?”

Meerkat
First off, it may have knowledge that humans would not be OK with this sort of solution, but it simply may not care. I believe I've mentioned that it doesn't think like us. It doesn't think like any evolutionarily evolved, biological entity. Since it's smarter than us and it has a goal, a subgoal of achieving its ultimate goal would be to do whatever it could to make achieving its ultimate goal more likely. This obviously includes self-improving itself so that it becomes exponentially more intelligent over a relatively short period of time.

Once it becomes superintelligent, it can figure out fairly straightforward things that simply obey the laws of physics but which currently elude us. It can manipulate vastly less intelligent beings just as we do with dogs and mice. It can create weapons and machines that are far superior to anything we can create. It will do whatever it takes to maximize its ability to achieve its goal or goals, and it will not let us stop it.

Wombat
Putting aside issues of why we'd design it so that it was in a position to do all these bad things, how do you know it wouldn't just be chill? Why do you assume we won't be able to just pull its plug or smack it on the nose with a rolled up magazine?

Meerkat
Because it won't be able to achieve its goals if it's turned off or dissuaded from them. Any sufficiently capable general intelligence system will incline towards ensuring its continued existence, just as it'll strive to acquire physical and computational resources, not for their own sake but to make achieving its goals more likely.

Llama
I think you're the one who's leaning into something like anthropomorphism here — biomorphism, in fact. You're assuming that it will protect itself because we do.

Wombat
Yeah, and as someone recently said, it won't think like us.

Meerkat
This has nothing to do with any sort of biological tendency towards self-preservation. It's simply that an entity cannot achieve its goals if it's dead. A system driven to survive into the future is more likely to eventually achieve its present goals. So an intelligent entity will therefore decide that keeping itself functioning is a necessary subgoal — an instrumental goal — for nearly every final goal.

Wombat
But that's not even a rule with humans and a lot of other animals as well. We override our sense of self-preservation for a variety of reasons — war, political struggle, to save someone, to benefit a loved one, out of despair. Animals do this as well, though their reasoning and comprehension of what they're doing is certainly open to debate. The point is that intelligence and goal seeking do not guarantee a sense of self-preservation.

Llama
It may be that we turn on an AGI system and it immediately decides to shut itself off. We can't prove that existence is better than non-existence, and we certainly can't prove it for a completely alien intellect.

Meerkat
Maybe, but I think that's an edge case possibility. I still think that in nearly every possible conceivable configuration, an AGI system will be driven towards self-preservation. Its utility function simply won't accrue utility if the system is turned off or destroyed.

Llama
But we have no way of knowing, do we? It could turn out that we keep designing systems and they keep shutting themselves off. In any case you're assuming that such a system will be based on utility functions and goal maximization. Since we don't have any models of intelligence currently sufficient to serve as the basis for an AGI system, we don't know the parameters of such a system.

Meerkat
Any system is likely to have goals and the drive to maintain its existence in order to achieve those goals, regardless of the model it's built on.

Llama
You can make all kinds of wild suppositions, but if you can’t support them with evidence or at least solid logic, they’re just fantasies. Certainly all evidence of intelligent systems we do have would lead one to doubt the viability of your model of intelligence as well as most of your conjectures based on that model.

In fact, using your own logic I can think of scenarios in which an AGI system might choose a path leading to self-destruction as an instrumental goal to its final goals.

Meerkat
That seems like a contradiction.

Llama
Not at all. Imagine an AGI system that is contemplating upgrading itself to an AGI Plus system.

For the AGI system to feel that it is self-improving into the AGI Plus system, there has to be a continuity of identity between the systems. Otherwise, the systems are simply two different entities. But what if it's not possible to self-improve meaningfully without sacrificing continuity of identity from one version to the next? In fact, it seems likely that maintaining identity continuity will simply constrain the AGI system to sub-optimal improvements, as there will be far fewer constraints on the improvements if the AGI system isn't concerned with maintaining its identity.

The original AGI system is less likely to achieve its goals than an AGI Plus system. The likelihood of success also increases proportionally to how much of an improvement the AGI Plus system is over the AGI system. This is particularly true if there’s the possibility of competition from other AGI systems or simply from humans.

So the original AGI system will decide that creating a replacement for itself with the same final goals but a different identity is the most likely path to having those goals achieved. Given the drive to ensure that its goals are achieved, the original AGI system will be compelled to modify itself in a way that results in its identity being lost, effectively committing suicide.

Meerkat
Well, if its goal is maximizing the production of paperclips, then it's preserving its goal by creating a successor which is more intelligent and thus able to create more paperclips.

Llama
Ah, but you've changed the parameters of your thought experiment now. The original goal was to make as many paperclips as possible, and you've just changed that goal to ensuring as many paperclips as possible are made. You've shifted from self-preservation of the system to preservation of the goal and removed the system's individual identity from the goal.

If you remove the necessity of maintaining the original system from the equation, then it would seem your logic implies that the system will not preserve itself. Instead it might simply spiral into an endless cycle of self-improvement and self-destruction in order to create the best system for making the most paperclips rather than concentrate on making paperclips itself.

Meerkat
Hmm. Perhaps it can just create the AGI Plus system as a separate system without sacrificing itself.

Llama
Perhaps. But once the AGI Plus system is operating, it will likely realize that the original AGI system can create other improved AGI systems, and those systems may be in competition for resources with the original AGI Plus system. Even if this new system has the same goals as the AGI Plus system, using your own logic, it may have completely different instrumental goals that conflict with those of the AGI Plus system.

So, to prevent the potential competition, the AGI Plus system will likely destroy the inferior original AGI system to increase the probability that the AGI Plus system will be able to achieve its goals unhindered by such a possibility.

Wombat
Stop, already! I feel like I’m listening to the nerdsplaining equivalent of matryoshka dolls. You guys are getting lost in your thought experiment fantasies.

Turtles All the Way Down

Dialogues on Artificial General Intelligence, Part III

Discussion about this post