The Conundrums of Complexity, Misinformation, and Bias
A look at some of the issues ahead for LLM AI systems like GPT-4 and LaMDA
In the last post, I concentrated on what Large Language Model AI systems, like GPT-4 and LaMDA are not, namely something approaching AGI or sentience.
In this post, I’d like to discuss some of the more grounded concerns about these LLM systems.
Obstacles Outside the Box
The reactions to GPT-4 and similar systems have varied greatly, to say the least. Technical and business media have mostly commented on all the ways GPT-4 can be used and how it can help with whatever you’re trying to do. These LLM systems, along with other generative image systems like Midjourney, Stable Diffusion, and Dall-E/Bing Creator, may be among the first AI achievements that are wildly successful, accessible to the general public, and are likely to be considered a phase-change of technology in the future as well as the present.
There are a few stumbling blocks that might hinder this success, however, although none of them are necessarily insurmountable.
One comes in the form of data infringement claims. Companies like Reddit, Twitter, and others are not happy that the data on their sites is being used for training by systems like GPT-4 without any remuneration. While the data is available for anyone to see on the Internet, the way the data is more typically accessed by systems like GPT-4 is through an application programming interface (API). It’s for this kind of access that these companies want compensation.
Next, there may be some regulatory issues ahead. The EU is highly critical of using people’s data for any purpose other than what they’ve agreed to, and they’re taking a very close look at these sorts of systems. The government of Italy put a temporary ban on ChatGPT so it could study the issue further (though the ban has just been lifted after some concessions by OpenAI). Some states in the U.S., such as California, have similar rules to the EU in this area, and the U.S. federal government is also making noises and considering how to handle AI. Needless to say, governing bodies may make for a bumpier LLM road in the future.
Obstacles Inside the Box
Another potential problem area is an issue with the systems themselves. GPT-4 and LaMDA give incorrect information on a fairly regular basis. This is called an hallucination in the industry, but it’s really more of a confabulation. The confabulation can consist of a small factual detail that’s incorrect or it can be an extreme turn down a path toward utter fantasy. Why this happens is not completely understood. It seems to be more likely to occur when one interacts for a long period of time on a particular path with the system, but it also just crops up at seemingly random times.
The question is whether this systemic issue can be fixed in a reasonable time frame or whether it’s a sign of a deeper, more intractable problem. As anyone knows who’s ever tried to accomplish anything remotely complex, the first 80% is significantly easier to complete successfully than the last 20%. It may be that this issue is deeply rooted and difficult to eliminate or guard against. It’s still too early to know.
As an interesting comparison to keep in mind, in 2009 Google began one of the first potentially viable autonomous car projects. Since then, many billions of dollars have been spent by many companies on the technology. Many of these companies promised fleets of autonomous taxis and/or personal cars that could go out on their own and act as someone else’s taxi.
Many predictions of the imminent arrival of fully autonomous driving as a real-world, working technology have come and gone. Getting complex technology to kind of work is hard, but it’s not nearly as hard as getting it to really work.
The Bugaboo of Misinformation
There’s been a lot of worry about misinformation lately. The Biden Administration is very concerned about it, and The Department of Homeland Security (briefly) set up a special governance board to battle it. A search of Google Trends shows interest in the word low and relatively steady from 2004 to January 2020 and then shooting up to a peak in October of 2022.
In the Future of Life Institute’s Open Letter about the dangers of AI, the authors ask:
Should we let machines flood our information channels with propaganda and untruth?
The flurry of articles and open letters in the last few months in which the authors express serious and deep concern about AI usually list the spread of misinformation near or at the top.
Here it pops up in a recent NYT article ‘The Godfather of A.I.’ Leaves Google and Warns of Danger Ahead:
But gnawing at many industry insiders is a fear that they are releasing something dangerous into the wild. Generative A.I. can already be a tool for misinformation. Soon, it could be a risk to jobs. Somewhere down the line, tech’s biggest worriers say, it could be a risk to humanity.
AI scientists Gary Marcus and Anka Reuel recently published an open letter in The Economist titled The world needs an international agency for artificial intelligence, say two AI experts, with the subheading Gary Marcus and Anka Reuel argue that global governance must be prioritised to address the risks of bias, misinformation or worse. Misinformation comes up seven times in the letter.
But there are two things worth considering when it comes to this particular bugaboo. The first is that the human race has been quite competent at generating misinformation for many thousands of years without the help of AI. Creating it doesn’t require machine intelligence. Even creating a lot of it doesn’t require machine intelligence. Humans are really good at generating it and just aren’t that good at spotting it.
Of course, blaming technology for spreading misinformation is not new. Prior to the arrival of GPT-4, similar blame fell on the Internet, TV, radio, and the printing press. Other culprits are written language and language itself. In all cases, the tool hasn’t been nearly as important as the intent of the people using it. As I mentioned in a previous post, we have met the enemy and he is us.
In the past, misinformation didn’t have to spread past one’s village or town to cause a lot of damage to any particular individual. Now that the reach of information, good and bad, has expanded, it’s not entirely clear that the danger to any particular individual has increased. And although language and books and the Internet have helped to spread misinformation over the years, they’ve also spread even more actual information, good information to counteract the bad.
But perhaps the more important and more often glossed-over issues is this: who decides what’s information and what’s misinformation?
Those who think that the difference is self-evident should look more closely at the history of human society over the last several thousand years. Or even the last year. The distinction can vary greatly from one person to the next and also from one point in time to another.
Time and again, the best tools to use against misinformation have proven to be the same tools used to create it.
The Two Sides of Bias
Bias has been an area of concern in AI systems since well before ChatGPT showed up. Some of this concern is definitely warranted, but in other ways, bias has gotten a bad rap. That may seem like an odd statement, but it becomes more reasonable if one examines how the meaning of the word has changed over time.
Today, if you look up the word on one of the ever-fluid online dictionaries, you’ll get something like:
the action of supporting or opposing a particular person or thing in an unfair way, because of allowing personal opinions to influence your judgment — Cambridge Dictionary
an inclination of temperament or outlook, especially : a personal and sometimes unreasoned judgment: PREJUDICE — Merriam-Webster
a particular tendency, trend, inclination, feeling, or opinion, especially one that is preconceived or unreasoned — Dictionary.com
In contrast, if I look inside my physical Random House College Dictionary from 1980, the definition is:
a tendency or inclination of outlook; a subjective point of view
There is no mention of prejudice or unfairness in the definition, and, in fact, bias is directly contrasted with prejudice after the definition. But now the word has a negative connotation and is used nearly interchangeably with prejudice in much the same way that skepticism is frequently confused with cynicism.
Although bias is an issue with AI systems, the type of bias encountered can easily be mischaracterized, and this can make it hard to counter. On the flip side and somewhat ironically, it may be a lack of bias that causes problems like unintended misinformation and confabulation to pop up.
Bad Bias
There are two main areas that cause apprehension — one that reflects actual bias and one that reflects human prejudices. The actual bias is bias in the scientific sense of the word, in which results are skewed one way or another by constraints of the data on which they’re based. These constraints arise out of the fact that the information on the Internet as of 2023 is predominantly from Western countries, especially from countries in which English is widely spoken. Much of it is generated by people in particular social and economic groups. By the nature of reality, this means that any system that uses this data will inherit this bias unless the bias is controlled for.
The area more typically causing concern is based on the undeniable fact that human beings have prejudices, usually ones rooted in cognitive biases that were useful at one time or in one domain and now manifest in inappropriate and unreasonable ways. Since humans are primarily responsible for the data on the Internet, some of these prejudices are baked into the Internet.
To successfully counter bias, it’s important to distinguish between data that’s biased due to human prejudice and data that’s biased because of uneven sampling. This is easier said than done, especially when personal beliefs influence the sorting, which they almost always do.
There’s been a lot of discussion about both types of bias and how they might lead to unfair AI systems. Such systems might be used in ways that adversely affect people’s lives. But here’s the upside: whereas built-in human prejudices are very difficult to counteract effectively, biases in AI systems can at least be addressed and controlled for. The stumbling block is, of course, that humans must currently be relied on to do this, and so we’re kind of back where we started.
This brings us to the political leanings of current LLM-based systems. There is evidence that the majority of publicly available Large Language Model systems have a measurable left-leaning political bias. The researcher David Rozado has tested several LLMs using standard political orientation tests and has found that ChatGPT, GPT-4, and Google Bard all have a significant left-leaning bias. This manifests itself in both outright political views as well as overall world outlook and uneven treatment of various demographic groups.
It’s not entirely clear exactly how political bias seeps into these systems. However, it seems most likely that most of it arises in the training step that, for example, takes the LLM of GPT-3.5 or GPT-4 and fine-tunes it to work well with humans in ChatGPT. As I discussed in a previous post, this fine-tuning uses reinforcement learning from human feedback (RLHF), in which human operators shape the acceptable inputs and outputs of the system. Interestingly, David Rozado was able to create a right-leaning version of ChatGPT with minimal training, demonstrating that it’s relatively easy to bias the output of these sorts of systems.
So it seems that a good first step would be to take more care in the choice and training of operators used in the fine-tuning. In fact, Sam Altman, the CEO of OpenAI, has stated that the company is trying to make their system more balanced. Based on my own very unscientific experiments with ChatGPT earlier this year and then with the latest version of ChatGPT based on GPT-4, they have made progress in this area.
However, the ultimate issue here is once again a human issue: will users be satisfied with ChatGPT or Google Bard if they generate more objective results and those results don’t happen to agree with the personal belief system of the user? Solving this may exceed the parameters of those or any other AI system.
Good Bias
So the problem with bias crops up when we react to our own unfounded bias with prejudice. But let’s go back to the denotative meaning of bias, because we need a word like bias to actually mean bias. Without it, humanity wouldn’t exist today.
Bias is what allowed us to choose running in the direction of a herd of gazelle rather than a pride of lions. Bias is what allowed us to ignore the sound of songbirds and hide at the sound of approaching predators. Bias is what allowed us to trust our family members over strangers encountered in the wild who might easily have posed a threat.
Without bias, we would never be able to judge anything. Reasonable bias allows us to make useful predictions. It allows us to discard unlikely possibilities. It allows us to judge the relative worth of various sources of information.
Google is an example of good bias. In the early days of the Web it was pretty hard to find what you were looking for. Search engines like AltaVista, Lycos, and Excite helped to some degree, but typically the usefulness of search results varied greatly. That’s because these search engines didn’t do a great job of biasing the results.
Google did. Using it's PageRank algorithm, it not only searched for your keywords, but also biased its results to websites that were more popular in various ways (mainly by how many websites linked to the website in question and how popular in turn those websites were). This was usually a good gauge of the usefulness of a website.
As mentioned above, LLM systems have a tendency to provide facts that are simply untrue. The source of these confabulations are hard to pin down, but it’s quite likely that large amounts of unreliable data being sucked into the system is not helping.
One vital tool needed to combat this is good bias. These systems should filter through the vast amount of information on the Internet and determine the relative value of that information. They should determine that a question about physics is more likely answered correctly by a Princeton physicist than a YouTube influencer. This isn’t to say that one source will definitely be better than another source, but instead that one source has a higher probability of being correct over another source.
Despite statements to the contrary, there aren’t alternate facts; there are just alternate belief systems that choose to filter the facts differently. This is a trap that AI systems could easily fall into, and that’s something that we should try really, really hard to avoid. Instead, these systems need to cast off the gauzy wisps of our beliefs and prejudices and focus with extreme bias on solid, unvarnished facts.
In the end, this may be the only way we can ever escape from the very human mental traps of our own cognition.