The Semantic Slide of AGI

The use and misuse of the term Artificial General Intelligence

Feb 17, 2024

A desktop computer with robot arms coming out of the side and a human face mask on the screen.

Technical terms seem to undergo the same definitional drift as words do in any language, but like much in the world of today’s technology, the shift seems to happen a lot faster than it used to.

From Human Intelligence to Robot Vacuums

The term Artificial Intelligence was originally used to refer to a technology that was, for the most part, equivalent to human intelligence in functionality. As mentioned in the intro post for this blog, the term AI was coined by a group of scientists in 1955 who hoped to spend a summer at Dartmouth banging through the difficulties of human intelligence.

As time went by and the task proved more difficult than expected, the option of shifting the meaning of AI rather than accomplishing its original goals was more and more tempting. Inevitably, the term began to shift to mean technology that had vaguely similar capabilities to human intelligence in very specific areas.

Artificial General Intelligence came to be a generally accepted term in the first decade of this century. The main popularizers of it in its early days were computer scientists Ben Goertzel and Shane Legg.

Goertzel was one of the primary organizers of the first conference specifically for AGI in 2007. In his paper for that conference (which became the first chapter of the book compiling its proceedings), he defined AGI and what distinguished it from AI:

The vast bulk of the AI field today is concerned with what might be called “narrow AI” – creating programs that demonstrate intelligence in one or another specialized area, such as chess-playing, medical diagnosis, automobile- driving, algebraic calculation or mathematical theorem-proving. Some of these narrow AI programs are extremely successful at what they do. The AI projects discussed in this book, however, are quite different: they are explicitly aimed at artificial general intelligence, at the construction of a software program that can solve a variety of complex problems in a variety of different domains, and that controls itself autonomously, with its own thoughts, worries, feelings, strengths, weaknesses and predispositions.

Very specifically, he described work on AGI as “the creation of software programs displaying broad, deep, human-style general intelligence.”

He also noted that the shift in focus of AI was a fait accompli, which was the reason for coining the new term:

Artificial General Intelligence (AGI) was the original focus of the AI field, but due to the demonstrated difficulty of the problem, not many AI researchers are directly concerned with it anymore.

With the introduction of modern machine learning techniques and recent breakthroughs such as the Transformer Architecture, Latent Diffusion Models, and Large Language Models, many of the early limitations of AI have been overcome. LLMs such as OpenAI’s GPT-4 , Google’s Gemini, and other similar systems are certainly way beyond the capabilities of the narrow AI that Goertzel referred to above.

A reasonable question one might then ask is: if they are not the narrow AI that Goertzel was referring to, what exactly are they?

The Qualitative Nature of Human Intelligence

In a previous post, I suggested a functional definition of intelligence to counterbalance the often short and not very helpful definitions that seem widespread in the field of AI. That definition is:

Intelligence is that quality which allows an entity to solve a wide range of deductive and inductive problems, extract and prioritize information from the environment, infer causal as well as correlative relationships from both small and large data sets over many known and novel domains, generalize knowledge from a known domain to another known or novel domain, extrapolate probable outcomes from both factual and counterfactual circumstances, recognize in its own cognition both the potential for fallacies and the fallacies themselves, synthesize existing knowledge to form original concepts, and acquire awareness of its own cognition and of itself as an independent and unique entity distinct from other entities and from its environment.

So how well do LLMs stack up to this definition?

Solve a wide range of deductive and inductive problems: LLMs do a pretty good job of this as long as they’ve had extensive and wide-ranging training data in the domains of those problems. Straight LLM systems (i.e., not hybrid AI systems) still have problems dealing with logic and math problems, as they don’t have any internal model for the underlying logic or math. Instead, they’re doing statistical analysis of the math and logic data that they’ve ingested.

Extract and prioritize information from the environment: LLMs are relatively good at doing this after significant training, although they sometimes come up with information more generic than one might hope depending on how much training data there was in that area. They are not very good at doing this if the information they’re fed after training does not match what was in the training data.

Infer causal as well as correlative relationships from both small and large data sets over many known and novel domains: This one is a little more complicated. LLMs are good at inference based on large data sets they’ve ingested. With an extensive enough learning dataset, they are relatively good at inference from a smaller amount of data, although not always as accurately as a human. They are pretty poor at doing this in novel domains, i.e. domains in which they’ve ingested limited or no training data.

Generalize knowledge from a known domain to another known or novel domain: Things are getting a little trickier to pin down now. LLMs are not good at doing this with novel domains, domains of knowledge they have not ingested during training. I suspect that they are also not really generalizing knowledge from one known domain to another known domain, but instead ingest enough data in both domains and apply that knowledge separately.

Extrapolate probable outcomes from both factual and counterfactual circumstances: LLMs are pretty hit and miss on this. This is an area where their lack of actual comprehension of the world and the data they ingest frequently becomes apparent.

Recognize in its own cognition both the potential for fallacies and the fallacies themselves: LLMs fail pretty badly at this. They’ll provide completely false answers to questions without hesitation, and are unable to recognize when they’ve made a mistake or when they are likely to have made a mistake. There are fixes being implemented to try and get around this, but these really amount to a patch over the problem rather than any fix to the underlying system.

Synthesize existing knowledge to form original concepts: LLMs are completely unable to do this. There have been some who claim that LLMs have original concepts in the form of creativity. However, this seems unlikely given their architecture, and there is no reason to think that they’re doing anything other than pseudo-random remixes of their training data.

Claims of creativity first started appearing shortly after the triumph of Google DeepMind’s Alpha Go system beat the world champion Go player. Some said that since several of the system’s moves had never been observed before, they were the result of creative thinking on the part of the system. While perhaps possible (though very unlikely), a much more straightforward explanation is the fact that there are an incredibly huge number of possible moves, and the system is able to analyze them better and faster than a human. It also was able to play millions of games against itself during training, many more than any human would have played. This means that it was not only possible but likely that the system would come up with some moves that seemed novel to humans.

Acquire awareness of its own cognition and of itself as an independent and unique entity distinct from other entities and from its environment: Again, LLMs are completely unable to do this. Despite the ease in which some are fooled into believing otherwise, there is absolutely no evidence that they have the slightest comprehension of what they’re doing, what they are, or the nature of the world around them.

Admittedly, this is my own definition of intelligence and others might disagree with it. However, the short and snappy definitions frequently used for intelligence are so non-specific that one could argue they apply to many things that we don’t consider intelligent in the way humans are intelligent. The point of my definition is to be very specific about what we’re talking about and to have a definition that most people would consider as describing many aspects of human intelligence.

The Unfortunate Haziness of Words

It’s become apparent to pretty much everyone that Generative AI systems are extremely powerful and will provide a multitude of useful tools to humanity. But, while current systems are a significant improvement over AI systems of the past, they are also not really AGI, at least not as the term was originally conceived. Instead, they’re something in-between old school AI and AGI.

The terms learning and reasoning are thrown around frequently when talking about LLMs and what they’re doing, but this is somewhat of a misnomer. When an LLM ingests datasets, it is building a weighted, hierarchical statistics tree. By ingesting huge quantities of data, it is able to build up a fairly accurate statistical representation of that data. When it responds to queries, it uses this statistical tree to calculate the most likely response to its input (with a little randomness thrown in). That appears to be vaguely similar but significantly different from the process by which humans learn and reason.

LLMs are said to know things, understand things, have general knowledge, infer and deduce things, etc. All these terms suggest that the LLM is doing something similar to what humans do, but this is just not the case.

A rough analogy can be made to demonstrate the difference between how an LLM works and how the human brain works. This isn’t meant to be a description of any actual implementations of LLMs, just a general analogy of the kind of differences between the two.

Let’s say you are presented with a 6-sided die and have no previous knowledge of dice. Then you are asked: How likely is it that any one particular side of the die will face up versus any other side facing up if the dice is thrown in the air and lands on a table?

The LLM has ingested no information about dice in its training data, so it would have to be fed data on results of the dice being tossed on the table. If there is only one throw in this new dataset, it would assume that there is 100% chance that the side landing face up would always be face up. If there are 10 throws, it will, on the basis of its statistical analysis, have a rough idea of how likely any side is to land up. However, there will be a large margin of error with a sample size of only 10. With 100 throws, the margin of error decreases, with 1000 it decreases a lot more,. By the time it has data on 10,000 throws, it’s going to have a very small margin of error and can correctly conclude that there is an equal chance that any particular side will land face up.

A mature human would take a look at the die and realize the answer right away. This realization would be based on their intuitions of spatial relationships, geometry, hard body dynamics, and gravity developed fairly rapidly from birth. There is some debate as to when and how these intuitions come into being, but there’s a good chance that a lot of it is hardwired into the brain and fine-tuned during early childhood.

Both the human and the LLM system come to a correct answer, but they do it in very different ways. If you were to switch from a 6-sided die to a 20-sided die, the whole statistical analysis process would have to be repeated for the LLM. The human would look at the 20-sided die and immediately realize that the result will be the same as with a 6-sided die.

This is illustrative of the qualitative difference between the way human intelligence works and the way LLMs work. Cognitive scientist and Substack blogger Gary Marcus recently put up a good post pointing out some of the evidence demonstrating pretty explicitly that Generative AI relies on statistical analysis rather than any sort of reasoning or understanding and how that affects its capabilities.

The Semantic Slide

Which brings us to a phenomenon that seems to be cropping up more and more lately, in which the ability to apply Artificial Intelligence techniques to various general areas of interest is being equated with the near or outright creation of Artificial General Intelligence. Examples range from statements that “we’re well on the way to AGI and it’s just around the corner” to “we’ve pretty much already achieved AGI.”

The source of this phenomenon starts with suggestive statements by those at the top of the field. As an example, two of the leaders of Google DeepMind recently tweeted about a new system the company had developed called AlphaGeometry, which achieved significantly better performance than previous systems in solving hard problems in Euclidean Geometry — performance comparable to a prize-winning, high school-level human mathematician.

Demis Hassabis, the CEO and co-founder of Google DeepMind, sent out a tweet in January of 2024 stating:

Congrats to the #AlphaGeometry team on their impressive breakthrough published today in @Nature using a neuro-symbolic hybrid AI system that can solve Maths Olympiad-level geometry problems. Another step on the road to AGI.

(The last sentence was later deleted.)

Shane Legg, Chief AGI Scientist and co-founder of Google DeepMind, also sent out a tweet that day:

As someone who still vividly remembers trying to solve crazy hard geometry problems at the New Zealand IMO training camp in Christchurch way back in 1990... it kind of blows my mind to see how good AI has become at this! AGI keeps getting closer.

Google DeepMind achieved its success with AlphaGeometry by combining an LLM with a symbolic engine, a logic based module similar to old school AI that uses rules and symbolic manipulation to make logical deductions. According to Google DeepMind and some others, this was akin to the reasoning ability of humans and might very well lead to human-type reasoning in other areas.

That’s quite a claim, and it seems more likely that what they achieved is simply akin to one type of logical deduction ability of humans. Reasoning in humans, after all, is a many-faceted capability. Time will tell if what they did is really equivalent to human reasoning and whether it can be applied to different domains, particularly domains significantly distant from euclidean geometry.

A somewhat more explicit statement was made in a January 18th video post on Instagram from Mark Zuckerberg outlining Meta’s plans to buy a lot of NVDIA GPUs to create future products:

It’s become clearer that the next generation of services requires building full general intelligence, building the best AI assistants, AI for creators, AI for businesses and more.

It’s unlikely that Zuckerberg is suggesting that Meta’s next generation of services will require “the creation of software programs displaying broad, deep, human-style general intelligence” and even less likely that he was suggesting that Meta will be developing a system “that controls itself autonomously, with its own thoughts, worries, feelings, strengths, weaknesses and predispositions.” What he seems to mean is that Meta is planning on developing products using AI that’s somewhat better than today’s and that can be used in a number of different capacities.

It seems like this change in direction on the meaning of general intelligence started this year. As recently as November of 2023, Sam Altman, CEO and Co-founder of OpenAI, gave a much more reasoned discussion of this topic in a Fellowship Lecture at Oxford Union. When asked if he thought that the path to general intelligence meant just improving out current LLMs or whether another breakthrough would be necessary, he answered:

I think we need another breakthrough. I think we can push on large language models quite a lot and we should, and we will do that. We can take our current hill that we're on and keep climbing it, and the peak of that is still pretty far away,. Within reason. I mean, you know, if you push that super, super far, maybe all this other stuff emerges. But within reason, I don't think that will do something that I view as critical to an AGI. To stick with that example from earlier in the evening in physics — let's use the word superintelligence now — if a superintelligence can't discover novel physics, I don't think it's a super intelligence. Training on the data of what you know, teaching it to clone the behavior of humans and human text — I don't think that's going to get there and so there's this question, which has been debated in the field for a long time, of what do we have to do in addition to a language model to make a system that can go discover new physics. And that'll be our next quest.

This seems like a very reasonable take.

However, once the new year rolled around, Altman seemed to swerve a bit on this. In an Axios interview at Davos in January, Altman responded to a question about what he sees happening in this new year:

There are all these things that can happen, and I'd love to talk about sort of all the specifics. But the general principle, I think the thing that matters most, is just that it gets smarter. So GPT-2 couldn't do very much. GPT-3 could do more. GPT-4 could do a lot more. GPT 5 will be able to do a lot, lot more, or whatever we call it, and the thing that matters most is not that it can, you know, have this new modality or it can solve this new problem, it is the generalized intelligence keeps increasing and we find new ways to put that into a product.

I have no idea if this is some new marketing decision or an actual change in perspective. What is apparent, though, is that this change is being snatched up and gnawed on by news outlets and online independent media. While leaders in the field limit themselves to suggestive semantic dances around the topic, many others are declaring that AGI is pretty much in the bag:

Sam Altman STUNS Everyone With GPT-5 Statement | GPT-5 is "smarter" and Deploying AGI..

Sam Altmans SECRET Plan For AGI - "Extremely Powerful AI is close"

Raising $7T For Chips, AGI, GPT-5, Open-Source | New Sam Altman Interview

Sam Altman Says AGI Soon and AGI Will Help People Do A LOT More

SAM ALTMAN SAYS HUMAN-TIER AI IS COMING SOON

The Path From AI to AGI

The question remains whether AGI can result from an extension of Generative AI or whether something else is necessary.

Back in that 2007 paper, Ben Goertzel wrote:

The presupposition of much of the contemporary work on “narrow AI” is that solving narrowly defined subproblems, in isolation, contributes significantly toward solving the overall problem of creating real AI. While this is of course true to a certain extent, both cognitive theory and practical experience suggest that it is not so true as is commonly believed. In many cases, the best approach to implementing an aspect of mind in isolation is very different from the best way to implement this same aspect of mind in the framework of an integrated AGI-oriented software system.

It’s possible that LLMs work the same as human intelligence and can just be ramped up to equal it, but the evidence suggests otherwise.

As Sam Altman himself stated in Oxford, we’re looking for something in AGI that is completely absent in current AI systems. We’re looking for the kind of intelligence than can discover new physics. Einstein isn’t famous because he read all the physics work out there and was then able to write short summaries of it or solve already solved problems.

What he came up with wasn’t there waiting to be analyzed and correlated; it was something beyond the data, something that no one before him had yet conceived of. He famously imagined what it would be like to chase a beam of light and travel alongside it, and this led him to discover something new and unique and monumental.

In other words, AGI should not only be capable of answering our questions, it should be capable of asking new ones that we’ve never imagined.

Postscript

Just after this post was written, OpenAI announced the development of Sora, a text-to-video system they’d developed. Sora isn’t available to the public yet, but OpenAI did post some very impressive videos showing Sora’s ability to create comparatively long (up to a minute), high fidelity videos based on text prompts.

Sora does represent a significant improvement over previous text-to-video systems. It’s really impressive.

However, despite many claims to the contrary, nothing about it indicates that it possesses anything like “understanding” of the physical world. As Gary Marcus again points out in a post, it makes a lot of strange errors.

No doubt its performance will be improved, but what these errors expose and what Sora’s technical paper pretty much confirms is that it’s still just very advanced statistical analysis of large data sets. The dice analogy still applies.

Unfortunately, at the end of OpenAI’s announcement was the following:

Sora serves as a foundation for models that can understand and simulate the real world, a capability we believe will be an important milestone for achieving AGI.

Sam Altman then gently fanned the flames with a tweet at the time of the announcement proclaiming that OpenAI was “extremely focused on making AGI.”

While this mention of AGI is relatively tame, it’s only fed rampant new claims about the arrival of AGI:

OpenAI Introduces SORA. | AGI is here.

AGI in 7 Months! Gemini, Sora, Optimus, & Agents - It's about to get REAL WEIRD out there!

OpenAI just dropped the biggest bomb on AGI— Sora

Sora: OpenAI’s Leap Towards AGI through Text-to-Video AI Innovation

And so it goes…

Sora’s announcement has led some to double down on the idea that just scaling up training data and compute resources with our current techniques will lead to AGI. Yet, this idea simply has no foundation in empirical data or even in theory.

It’s reminiscent of the “reaching the moon” fallacy. The Wright Flyer brought humanity a step closer to eventually reaching the moon. But if we’d made a Wright Flyer with a fuel tank a thousand times bigger or made it with an engine a thousand times faster, would we have made it any closer to the destination?