Thought Experiments of Existential Disaster
Consider the following three thought experiments:
The Paperclip Maximizer
First described by technology philosopher Eliezer Yudkowsky and discussed in a previous post, this thought experiment involves a superintelligent system that has been designed to maximize the production of paperclips, a seemingly harmless endeavor. It then goes on to turn every atom in the universe into paperclips (including humans).
The Happiness Maximizer
In a paper first published in 2011 and then revised in 2013, former executive director of the Machine Intelligence Research Institute Luke Muehlhauser suggested a thought experiment in which a superintelligent system has been created to maximize human happiness. It decides that it’s more efficient to rewire the human brain so that humans are happiest when sitting in jars rather than to try and create a utopian world that caters to the complex nuances of existing human brains.
The Cure for Cancer
In his 2019 book Human Compatible: Artificial Intelligence and the Problem of Control, computer scientist Stuart Russell suggested a thought experiment in which a superintelligent system is created with the goal of curing cancer in humans. Because many people are dying from cancer on a daily basis, the system digests all current knowledge of cancer available and then decides the most efficient way to find a cure is to induce many types of tumors in every living human being so as to carry out medical trials of potential cures.
These are three of the many scenarios offered by AI Dystopians to illustrate the problem of unintended consequences of AGI. A key ingredient of these consequences is the discrepancy between what humans and AGI systems could potentially consider a rational path to success in achieving benevolent goals.
The Rational Agent and Instrumental Rationality
Many people might suggest that the superintelligent systems described in each of these scenarios is irrational. Why would a system go to such extreme lengths to create something as mundane as paperclips? How could a system not know that humans will be extremely resistant to the idea of having their neurology rewired to live happily in a jar or to being given cancer so as to more quickly find a cure?
Looming large in discussions of AGI is the concept of a rational agent. This is a concept borrowed from economics, particularly in what's termed rational choice theory. The concept was originally used to model a consumer's choices using a mathematical construction called a utility function that would return a maximum value when the most rational or optimal choice was made given the inputs.
Over the years, however, it became increasingly apparent to economists that modeling people as rational agents doesn't give very reliable results. People’s choices are affected by cognitive biases and emotions, and the number of variables that go into decision-making can be substantial. In fact, there are not only too many variables to model most of the time, there are usually too many variables for any real-world human to take them all into consideration.
This has led to the concept of bounded rationality, meaning that an agent acts rationally given that there are constraints of knowledge, ability, and time that allow it to assess only a subset of all relevant variables.
Philosopher Nick Bostrom took this concept further in his 2012 paper The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents and his 2014 book Superintelligence: Paths, Dangers, Strategies. In both, he discussed the concept of instrumental rationality, meaning rationality that is confined to some subdomain of endeavor or circumstances. This is offered as a possible way to justify behavior that may seem irrational for an intelligent entity and thus disqualifying for intelligence.
He suggested that:
An agent could also be extremely intelligent, even superintelligent, without having full instrumental rationality in every domain.
This specifically addresses objections along the lines that an AGI system would know rewiring human brains would generally be unacceptable to human. The goals and actions of an AGI system may seem irrational to us because we're rational in domains that the AGI system is not and vice-versa.
Of course, there's a big difference between being unaware of data and being unable to process it rationally. Instrumental rationality is not used to imply that the AGI system is unaware of things that we're aware of, but rather that it's aware of the same things and they just do not compute in the same way they do for humans.
The Drive to Boost Rationality
Computer scientist Steve Omohundro has suggested that rationality is a quantitative quality that is directly correlated to intelligence. In his 2008 paper, The Basic AI Drives, Omohundro stated:
In real-world situations, the full rational prescription will usually be too computationally expensive to implement completely. In order to best meet their goals, real systems will try to approximate rational behavior, focusing their computational resources where they matter the most.
Omohundro believes that if a system loses computational resources, it will become less rational. It will be able to analyze fewer variables over a given timespan in order to make decisions, and thus it will be less able to achieve its goals. Thus, it will strive to acquire ever more computational resources, which will likely prove detrimental to its relationship with humanity.
In general, the behavior of AGI systems is likely to veer away from human behavior in ways that seem irrational to humans no matter how benevolent the ultimate goals of the AGI systems seem to humans. Furthermore, an AGI system will strive to seek more and more resources to create computational functionality so as to improve its rationality and thus maximize the probability of achieving its goals.
Catastrophe is, of course, the ultimate result.
On the Other Hand
The thought experiments described at the beginning of this post, and the many others that can be found in AI Dystopian literature, are not intended to be taken as literal possibilities. Rather, like the stories of King Midas and the Golden Touch and The Sorcerer’s Apprentice, they are used primarily as parables to demonstrate a point. There are several ideas these thought experiments are intended to highlight.
First, it is very difficult to avoid unintended consequences. These are failures of a system to do what one wants rather than what one asks for. They are design errors, and this is a major category of failure in engineering projects.
It’s important to note that the AI Dystopianism view is not that this is a possibility if we create AGI; it is, instead, an inevitability. That is the gist of the papers and books discussed, and that is a very different proposition. It’s also a proposition that rests on the logical and empirical integrity of the foundational arguments used to support it.
Next, the perspective of an AGI will likely be very different from our own, and what we may consider to be an unintelligent or irrational action may make rational sense within the bounds of the AGI’s intellect.
Lastly, these actions are likely to align to certain subset of behaviors as described by Bostrom’s Instrumental Convergence Thesis and Omohundro’s Basic AI Drives due to the nature of intelligence, and the result of this will likely be bad — catastrophic, in fact — for humanity.
Before exploring the assumptions underlying these conclusions, though, it’s worth pointing out another aspect of these kinds of thought experiments. This is the attribution of omnipotent and often omniscient abilities to the AGI systems. Little thought seems to ever go into how any AGI system designed by humans would acquire real-world knowledge and capabilities that far exceed that of humans, knowledge and capabilities that they would not possess even if they had every bit of existing human knowledge.
There are many unknowns in the universe that make interacting with it complex, difficult, and often very dangerous. Interacting with the real world is extremely difficult on a many levels of granularity. An AGI system would somehow have to do experiments and develop science and technology through interactions with the real world. Since it wouldn’t, at least initially, have this capability, it would have to simulate the real world internally.
Simulating the real world internally in sufficient detail to be useful would likely take up the majority of the computer resources it had available to it. Even more likely, it would take significantly more computational resources than it would initially have available to it to create even a very rough approximation of the real world. On top of this, there is much we don’t understand about the universe that it wouldn’t be able to simulate without experimenting with the actual universe to uncover that knowledge.
As Stuart Russell himself states elsewhere in his book Human Compatible:
While stretching your imagination, don’t stretch it too far. A common mistake is to attribute godlike powers of omniscience to superintelligent AI systems—complete and perfect knowledge not just of the present but also of the future. This is quite implausible because it requires an unphysical ability to determine the exact current state of the world as well as an unrealizable ability to simulate, much faster than real time, the operation of a world that includes the machine itself (not to mention billions of brains, which would still be the second-most-complex objects in the universe).
Yes, indeed…
Quantifying Rationality
In order to consider the above propositions about rationality, it’s important to understand what it means to be rational. While this might seem straightforward, the concept as discussed by Bostrom, Omohundro, and many others seem at odds with the actual meaning of rationality.
In both papers cited above, rationality is treated as a quantity, something you can have more or less of. The problem, however, is that this usage of the term blurs the difference in meaning between rational and correct, where correct here means the best decision based on an accurate assessment of all circumstances and possible variables. While it's questionable whether it makes sense to say one can have more or less rationality in the same way it's questionable as to whether it makes sense to say one is more or less pregnant, it is incorrect to equate rationality with correctness.
Rationality is the ability to make a reasoned decision based on a set of known circumstances. That decision may or may not be correct, but whether or not the decision is rational is not a function of the decision’s correctness. In other words, one can make a decision that turns out to have a poor outcome yet is still completely rational given the limited or faulty information available. Similarly, one can make a decision that turns out to be correct but was actually irrational given the information known at the time of the decision.
Rationality implies a direct correlation between circumstances and decision-making, while irrationality implies that there is no such direct correlation. What's specific to a particular sapient entity is the number of variables that can be accurately perceived and evaluated, the subgroup of those variables chosen for evaluation, and the weight given to each variable.
One could state that the AGI system is cognizant of this information but doesn't care because of indifference or malice. This possibility is worth discussing in a future post, but it’s not what’s being suggested in AI Dystopianism as reflected in the papers and books discussed above. Instead, what is being suggested is that rationality is a quantity one can have more or less of, and also a quality that may be limited in scope to certain specific domains of experience and knowledge.
Rationality and Ignorance
Bostrom’s concept of instrumental rationality seems to conflate the concepts of rationality and ignorance. It certainly seems likely that an intelligent entity may be ignorant of some subdomain of experience. It’s quite a supposition, though, to state that a superintelligent entity is somehow blocked from being able to reasonably asses that subdomain if a description of its circumstances is available.
One can posit that an AGI system might be ignorant of the fact that turning humans into paperclips will be unpopular with humans. One can claim that for some reason, the AGI system is antagonistic to humans or simply doesn’t care about harming them. However, one cannot logically claim that any of the scenarios presented above in the thought experiments would constitute rational thinking given even minimal knowledge of humans.
Omohundro’s statement above is that an AGI system will attempt to acquire ever more computational resources to improve its rationality and thereby improve its probability of achieving its goals. He proposes that more computation will allow the system to access more variables and assess them more accurately over a shorter span of time. But this only makes the system more likely to be correct, not more rational.
It would seem that objective rationality is instead an inherent quality of general intelligence, that intelligence without rationality is a logical oxymoron, and that the only varying factor from one intelligence to the next is how and how many circumstantial variables are assessed rather than a difference in kind or degree of rationality itself. In other words, the more intelligent a system, the more variables and circumstances it can consider, and thus the more likely it is to be correct.
It is highly questionable (and certainly not in any way demonstrated or proven) that one can have a “little rationality" rather than just rationality. Bostrom seems to be trying to suggest that in limited areas of knowledge or endeavor, an agent could display rational behavior and in other areas, could display little or no rationality at all.
There does not appear to be any reason why this would be the case, as rationality would seem by definition to be a component of cognition as it applies to any knowledge or endeavor. It could be said that one has knowledge in one area and not another, but when we're talking about rationality and intelligence in reference to cognition, which is what we're really looking for in an AGI system, then we mean the filter by which all knowledge is perceived and applied.
This brings us back to whether or not a system of such narrowly focused rationality should be considered to be a system possessing general intelligence. As described, these systems seem much more like the narrowly focused machine learning systems of today.
In talking about rationality, we're talking about an aspect of the cognitive functioning of our brain, the reasoning ability centered in our neocortex. Certainly in naturally evolved brains like ours, there is the additional factor of biologically inherited traits like emotions, cognitive biases, and outdated instinctual responses that may impinge on our decision making. Even putting those aside, some individuals may be better or worse at perceiving circumstances accurately in any given situation, and this could affect the accuracy of their decision making. Developmental and cultural differences can also greatly affect the set of variables used to make a decision and how those variables are evaluated.
Thus, while one individual’s decision might appear irrational to another individual, a discrepancy can arise simply because the two individuals are evaluating different sets of variables or weighing those variables differently. Putting aside the other biological impediments mentioned above, they could discuss their reasoning and at least understand how the other individual reached a decision even if they don’t agree with it.
One would assume that emotions, cognitive biases, and outdated instinctual responses would not be built into AGI systems and similarly hamper their reasoning skills. But even if they somehow cropped up in AGI systems, this is not what Bostrom, Omohundro, and other AI Dysoptians are suggesting will be the source of AGI problems.
What they are suggesting is that even if the humans are allowed to illuminate the AGI system in the area in which it is instrumentally irrational, the AGI system will remain irrational as far as the humans are concerned.
Rational Explanations
We should be able to explain to an AGI system such as the Paperclip Maximizer why we don’t want ourselves, or the universe in general, to be converted into paperclips. It should be able to understand our reasoning. It may still decide to proceed with its plan, but it won’t be doing so because it’s unable to process the area of rationality guiding our human point of view. There's a big difference between not agreeing with humanity’s point of view and not being able to understand or being oblivious to that point of view.
Similarly, it should be the case that an AGI system such as the Paperclip Maximizer would be able to detail its reasoning for converting the universe to paperclips, and, while we may not agree as humans, we should be able to follow the rationality of its decision making process. If it can’t do that or its explanation still leaves its decision unconnected to circumstances, then it can’t be claimed to be a rational system.
Again, it’s important to point out that what is being proposed in these AI Dystopian ideas is not that these AGI systems will be uncaring or hostile towards our perspective. Instead, they are suggesting that these AGI systems are rational in some areas and irrational in others despite possessing general superintelligence.
Because of this, they will make decisions detrimental to us simply because they may not be able to function rationally in the domain of humanity. Yet, such systems seem by their nature to be missing some vital aspects of cognition, missing them to the point that it’s difficult to consider them as generally intelligent systems at all.