How to make AI self-aware

Regardless of the currently popular text-to-video model SORA or the ChatGPT 3.5 that emerged a year ago, both have brought tremendous shock to everyone. During my use of ChatGPT, I became very curious about its cognitive level and raised a question: Does ChatGPT have self-awareness? If we want an AI to have self-awareness, what should we do? Teacher Zhang Jiang's article inspired me greatly, and this article is my summary of learning on this topic.

The earliest thoughts on whether AI will have self-awareness came from the news in 2022 about the firing of a Google engineer, Blake Lemoine (at that time, ChatGPT 3.5 had not yet appeared). In this news, this senior software engineer claimed that Google's AI chatbot LaMDA had self-awareness and provided chat records with LaMDA as proof. Of course, looking back now, it seems that the level is close to that of ChatGPT 3.5, and people have become accustomed to it.

What is Consciousness#

We must first clarify the meaning of consciousness. In general contexts, we can understand it as a biological entity's subjective perception and experience of itself and its environment. In neuroscience, consciousness is typically divided into two main components:

Wakefulness: This refers to the alertness of the organism or its ability to respond to external stimuli. For example, when you are awake and alert, you are in a state of wakefulness.
Awareness: This refers to the subjective experience of specific thoughts, feelings, perceptions, and memories. For instance, when you see a cat and know that you are looking at a cat, that is your awareness of your visual perception.

This still does not cover all concepts of consciousness. For example, some philosophical viewpoints, such as panpsychism, argue that all matter has some degree of consciousness, even if that consciousness may be very weak or different from that of humans and other higher animals. "Consciousness" is a complex topic that involves many different disciplines, including neuroscience, psychology, philosophy, cognitive science, biology, physics, and even artificial intelligence research. Different disciplines may have different definitions and theories, but the common goal is to understand how we experience and comprehend ourselves and the world around us.

In the neuroscience that studies consciousness the most, consciousness is considered to have three definitions (see the paper "What is consciousness, and could machines have it?"), which are:

C0, unconscious processing and conscious states. That is, what work requires conscious participation under what conditions? The author conducted many experiments to verify the existence of unconscious processing. Research in this area is relatively mature.
C1, overall availability. That is, information integration and identity. This aspect is related to the theory of "integrated information theory," which mainly studies how information integration occurs and how to quantitatively measure it.
C2, self-monitoring—this is the core of self-awareness.

The following will discuss AI's consciousness and self-awareness from these three levels.

First, we can use an example to illustrate what C0-level consciousness is, that is, what "unconscious processing" really is.

Snip20240222_44

For example, the letters A and B on a chessboard are actually of the same gray brightness, but our first reaction is always to think that A and B are different colors—this is because our brains have already done some logical processing without consciousness—believing that B is in the shadow, so B's color appears darker. Various experiments later found that there is a vast amount of unconscious processing involved in our brains. For instance, simple logical calculations or even decision-making (if one has memorized the multiplication table, when Chinese people perform corresponding calculations, they will use unconscious processing, making the calculations appear particularly fast).

Theories and Models of Consciousness#

To understand C1-level consciousness, it is necessary to model consciousness. Currently, there are two mainstream models: the GNW (Global Neuronal Workspace) and the IIT (Integrated Information Theory) models.

The Global Neuronal Workspace (GNW) is a model of consciousness proposed by American psychologist Bernard J. Baars and neuroscientists Stanislas Dehaene and Jean-Pierre Changeux. It is one of the dominant scientific theories about consciousness. It posits that consciousness arises from certain structural characteristics of the brain. This theory involves the concept of a "workspace" in the brain, where new information competes with and replaces old information. When the activity of one or more regions exceeds a certain threshold, it triggers a wave of neural excitation that spreads throughout the neuronal workspace, making the signal available for a series of auxiliary processes. The act of globally broadcasting this information is what makes it conscious.

This mode of operation is somewhat similar to small programs. These small programs do not require conscious participation and can automatically complete a series of tasks. What is consciousness? GNW believes that consciousness is like a stage; under special circumstances/stimuli, consciousness will load these small programs into the global consciousness system, bringing them to the center of the stage.

In this way, complex information processing can be done in this space, such as logical reasoning and decision planning. These require the participation of the entire brain. Moreover, consciousness can also send signals back to these small programs, allowing for quick actions.

image (17)

Scientists have now found a lot of evidence proving that there are many long-range connections across brain regions, which can be considered a physical basis for some global activation. In 2017, a report in "Neuroscience" revealed that scientists discovered "giant neurons." As shown in the figure, one color represents one neuron, with many synapses that are very long, essentially reaching the scale of the "brain." This has also become a physical evidence for the "Global Workspace Theory."

The second model is called Integrated Information Theory (IIT). According to IIT, the degree of consciousness of a system (such as a network or brain) can be judged by measuring the interconnectedness and integration of its components. This theory emphasizes the following two key concepts:

Integration: The IIT theory posits that consciousness is composed of many different elements (such as neurons) that, while independent, need to be integrated into an indivisible whole to form a conscious experience. This integrative quality is a fundamental characteristic of the existence of consciousness. For example, our conscious experience at any given moment is a whole; it cannot contain information from only one part (such as color or shape) while ignoring other parts.
Causal Power: Another important concept in IIT is "causal power." It refers to the influence of the system's current state on its future state and the influence of its past state on its current state. Systems with high causal power can generate complex interactions among their components, which is key to forming consciousness. For example, interactions between neurons form our thoughts and feelings.

These two concepts together constitute the core of IIT, which states that only when a system exhibits high levels of integration and causal power among its components can it be said to possess consciousness. This theory can even quantitatively define the "degree of consciousness" φ.

We can illustrate the IIT theory with a specific example. Suppose you slap your own body with your hand; when we perform this action, we can still observe it from two levels: at the microscopic level, this action undoubtedly causes a large number of cell deaths in the arm and body; while from the perspective of "I" as a whole person, these cells are undoubtedly constrained by "me." Ironically, "I" is a system composed of a large number of cells. If we follow conventional causal theory (reductionism), the characteristics of "me" as a human body are determined by the cells, just as "I" fear fire because the cells fear fire; once the cells are burned by fire, they will die. However, people can sacrifice for their ideals and beliefs, even to the extent of allowing their cells or even their entire being to be burned by fire. Clearly, this transcends the conventional causal theory from micro to macro and instead represents a causal inversion from macro to micro. "I" possess free will, so I can slap myself. All of this is due to the existence of a higher-level whole, which can act as an independent subject to exert "causal power," directing the causal arrow from the whole to the individual: I want to slap my arm, so the action occurs, and the cells die along with the action. The most important second point in integrated information theory emphasizes this "causal power."

There are certainly more models attempting to explain the mystery of consciousness than the two mentioned above. In 2022, "NATURE" published a review summarizing almost all models in the academic community regarding consciousness. For detailed information, see the table below:

image (18)

Consciousness Turing Machine#

The purpose of the above discussion on modeling consciousness is to reproduce consciousness at the software level. We know that existing computers are Turing machines. A couple of Turing Award winners (Lenore Bluma and Manuel Blum) published a "note" in PNAS proposing a "Conscious Turing Machine" model: a computable consciousness architecture.

image (19)

The Conscious Turing Machine can be seen as laying out the architecture of the "Global Workspace Theory," where a bunch of small programs can perform various distributed tasks below, and then there is a global space. The key point is that the Conscious Turing Machine implements some mechanisms for information upward and downward.

image (20)

As shown in the figure, this Conscious Turing Machine (TM) model visually presents a composite architecture that attempts to simulate human consciousness processes. In this model, consciousness processing is envisioned as a multi-layered, highly interactive information processing system that involves multiple steps from basic sensory input to complex decision output.

External inputs are first captured through sensory organs, and this sensory information enters the system in a read-only form, reflecting stimuli from the real world such as visual images, sounds, and touch. Once this raw data is received, it moves into the short-term memory module, which is the core of consciousness processing. Short-term memory plays a crucial filtering and integrating role here; it not only limits the amount of information that can be processed simultaneously (reflecting human attention limits) but also improves processing efficiency by integrating information into chunks. These chunks represent units of information formed through conscious processing and cognitive restructuring, which can be seen as the basic "currency" of conscious activity.

Meanwhile, long-term memory serves as a vast backend database, storing personal experiences, knowledge, and skills. This part of memory is usually in an unconscious state but can be elevated to the conscious level through internal mechanisms, such as UP-Tree competition. This competition mechanism reflects how our attention shifts from one topic to another and how we extract relevant information from a vast knowledge base to meet current situational demands.

In short-term memory, selected information chunks are sent to various dedicated processors through a rapid broadcasting system. These processors each have their own roles, processing specific types of information or performing specific tasks. For example, some processors focus on parsing visual-spatial information, some are responsible for internal speech and language processing, while others may connect to external databases and algorithms, such as Google search or AlphaGo. This distributed processing mechanism simulates how the brain processes various information in parallel and allows consciousness to consider multiple aspects and possibilities simultaneously.

When information is further analyzed and integrated within these processors, the final output is realized through an external output module, which can be speech, writing, or other forms of physical behavior. This step completes the entire loop from perception to action, reflecting how consciousness drives our interaction with the environment and formulates responses based on external feedback and internal goals.

Throughout the model, the design of information flow and processing processes aims to reflect the flexibility, dynamism, and creativity of human consciousness. It demonstrates how to adapt and influence the environment through different levels of processing and integration, from simple sensory input to complex thinking and behavioral output. This simulation of the Conscious Turing Machine, although abstract, attempts to provide a framework for understanding and reproducing the complexity and diversity of human consciousness.

Enhancing Planning and Imagination Abilities of Consciousness#

To explore the planning and imagination capabilities of consciousness, the father of LSTM (Long Short-Term Memory networks) proposed a reinforcement learning framework called World Models in 2012. He believed that reinforcement learning agents should embed a virtual world, i.e., a world model. In his research, he demonstrated through numerous experiments that models embedded with virtual worlds can learn more fully on relatively small sample data—because agents can dream.

The World Models reinforcement learning framework is an advanced method within the field of reinforcement learning that aims to improve the learning efficiency and adaptability of agents (such as robots or software agents) by simulating environments. This approach originates from how humans and animals use internal models to predict and interpret the surrounding world, attempting to replicate this mechanism in artificial intelligence systems.

The main components of the World Models framework are:

Visual Module (V): This part's task is to extract useful features and representations from raw inputs (such as pixels). In humans, this corresponds to perceiving the environment through vision and understanding surrounding objects and scenes. In machine learning, this is typically achieved through convolutional neural networks (CNNs) or other image processing techniques.
Memory Module (M): This part handles time-series data, helping agents understand the temporal dependencies and dynamic changes in the environment. This is akin to human working memory, used to store and process information about recent events. In computer models, this can be implemented through recurrent neural networks (RNNs) or long short-term memory networks (LSTMs).
Controller (C): This part makes decisions based on the outputs of the visual and memory modules and executes actions. In humans, this is similar to deciding how to act based on the current understanding of the environment and goals. In reinforcement learning, this is typically achieved through a policy network that determines which action to take in a given state to maximize future rewards.

Snip20240223_45

Specifically, the world model is an RNN whose input mainly includes two sets of elements: one set is the encoded world state, and the other is the agent's action at t-1. The purpose of this RNN is to predict the next state/reward/action. With such a world model, reinforcement learning agents can gain many benefits during learning. On one hand, we can deliberately train this world model (supervised learning mechanism) during training. On the other hand, it can dream—this is why world models can learn more fully on relatively small sample data.

The dreaming process involves taking an incomplete world model and training it separately through some hypothetical actions, which can generate virtual actions. At the same time, starting from a certain moment t, let the world model, as a simulator of the real world, generate the next moment's action/state and reward, and then use this dreaming data to train the execution part of the reinforcement learning agent. This way, we can optimize the objective function during the dreaming process, maximizing the reward. This greatly increases the training samples and reduces training time. Meanwhile, CMA-ES here is a planning algorithm. Thus, with a world model, there is a simulator, and the agent can set a future goal, finding a planning path to achieve this goal in the simulated World Model, thereby generating step-by-step actions.

Some friends may not be familiar with RNNs. Recurrent Neural Networks (RNNs) are a special type of neural network designed to handle sequential data and temporal dependency issues. Unlike traditional feedforward neural networks, RNNs have the ability to process information associated with previous and subsequent inputs, making them particularly suitable for handling language, time series, and other continuous data. The core of RNNs lies in their recurrent structure, which allows information to flow between different time steps in the network. This structure enables RNNs to retain and utilize information from previous time steps when processing new inputs, thus capturing temporal relationships and dependencies in the data. In practical applications, this means RNNs can remember past information and make more accurate predictions or decisions based on that information.

Although RNNs are very powerful in handling sequential data, they also face issues of vanishing or exploding gradients, which can affect the network's ability to learn long-term dependencies. To overcome these problems, researchers have developed more advanced RNN variants, such as Long Short-Term Memory (LSTM) networks and Gated Recurrent Units (GRUs). These improved models introduce gating mechanisms that help the network learn and retain long-term information more effectively, thus performing better in complex sequential tasks.

However, while the series of work on world models is excellent, an important regret is that the world model still lacks self. Although it can feed actions back to itself, this is merely an action and not entirely "reflection"—when we refer to "reflection" in the realm of "consciousness," it often indicates a mental state. In contrast to humans, our modeled world model includes self, while existing world model research does not encompass self. Another regret is that the dreaming process in the world model series is non-autonomous. Agents deliberately distinguish between "playing games" and dreaming, but humans can switch between or even do both at any moment.

Self-Awareness and Self-Referential Consciousness#

"Self-reference," usually translated into Chinese as "自指" or "self-reference," refers to a situation where a statement, expression, idea, or other type of information directly or indirectly refers to or involves itself. Self-reference is a common concept in mathematics, logic, philosophy, art, and other fields.

For example, in linguistics, a self-referential example might include a statement like, "This sentence is false." This statement creates a paradox by referencing itself because if it is true, then it is false, but if it is false, then it is true. In computer science, a self-referential example might include a computer program that references or modifies its own parts within its code, or a data structure (such as a recursive data structure) that references itself. This ability for self-reference is sometimes considered a hallmark of consciousness or self-awareness because it involves a system's ability to reflect on and understand its own processes or states.

What is the difference between self-awareness and self-referential consciousness? Self-awareness refers to a conscious system capable of reflection, reasoning, imagination, and other cognitive activities about itself, such as our human brain. Self-referential consciousness, on the other hand, refers to a conscious system that achieves self-reflection, reasoning, imagination, and other activities through self-referential principles. The latter clearly encompasses the former because the self-reflection achieved through self-referential techniques is a perfect mapping, that is, a "perfect" self-mapping in both spatial and temporal dimensions achieved through special techniques, while the former is likely to be an imperfect self-mapping. Furthermore, the latter can be seen as a normative theory of self-awareness, a theoretical prototype, while the actual self-awareness system may be an imperfect realization, limited and distorted by various factors. In contrast, self-referential consciousness, as a theoretical prototype, provides a goal for pursuing perfect self-mapping and self-understanding.

So how can we achieve self-reference in the field of computers? We consider two levels: hardware and software, thus the problem is decomposed into two:

How can a machine achieve self-replicating production?
How can a piece of code achieve self-replicating generation?

Regarding the hardware aspect, as early as 1965, von Neumann personally designed a machine capable of self-replication, as referenced in "Theory of Self-reproducing Automata." However, when exploring self-replication in software, a problem arose that could not be solved, namely the "infinite recursion" problem.

Print('Hello World')

Print('Print(\'Hello World\')')

Print('Print(\'Print(\\'Hello World\\')\')

A classic example is attempting to create a "self-printing" program, that is, a program that can output its own source code (as shown above). At first glance, this seems to fall into infinite recursion because to print its own source code, the program appears to need to reference itself infinitely. However, inspired by philosopher Quine, mathematicians and programmers found a clever way to achieve this goal, avoiding the recursion trap.

The core of this solution lies in the dynamic interaction between the program and its execution environment (such as the operating system). In this way, the program unfolds during execution, generating output that is identical to its source code. Such self-printing programs typically contain two main parts: one part is the "template" or "framework" (here referred to as the "virtual part" or "virtual aspect"), while the other part is responsible for generating the actual code of this framework (referred to as the "actual aspect"). The focus is on how the content of these two parts maps to each other and how the program's structure ensures that the output code accurately reflects itself.

Through this method, the self-printing program is like "looking in a mirror," where the "virtual part" provides a pattern, and the "actual part" fills in this pattern to produce a complete self-description. This is not only a clever trick in programming and mathematics but also provides a way to understand how to differentiate consciousness from non-consciousness through software and functional structure. In this framework, self-awareness can be seen as the system's ability to functionally distinguish and integrate its virtual and actual states. In other words, if a system can functionally distinguish and manage its internal representations (virtual aspect) and external manifestations (actual aspect), then this system can be considered to possess some form of self-awareness.

In fact, the self-replicating automaton constructed by von Neumann also follows a similar principle, where the key to achieving self-replication is—unfolding over time. Although many scholars believe that achieving perfect self-reference is a challenge because it seems to lead to infinite recursion, machines can achieve complete self-reference by utilizing Quine's techniques. This is akin to machines having a dialogue with themselves, gradually constructing their complete image through the unfolding of time. This process breaks through the boundaries of traditional beliefs about impossible self-reflection, demonstrating the possibility of achieving self-cognition through clever design (i.e., self-reflective ability).

When analyzed separately, both the machine and its description exhibit incompleteness. However, when we combine the two and operate according to the logic of natural time flow (from t to t+1) with the intervention of operating systems or nature, we can achieve a complete self-referential process. The core of this process lies in ensuring that the virtual world matches the real world as closely as possible and that they can operate in harmony. Although each has its shortcomings, the natural process can bridge these gaps, allowing the system to self-replicate and self-reference, thus achieving completeness.

Moreover, when the machine and its description form a mirror relationship, we touch upon fractal theory and its applications in nature and technology, such as von Neumann's self-replicating structures. This phenomenon of mutual imaging is not only a marvelous example of self-replication but also showcases the complexity of self-reference and its similarities in both nature and technology.

Extending these thoughts to humanity's pursuit of universal truths, we find that although humans may not perfectly understand the vast universe that includes themselves, through self-referential techniques, we do not need to achieve perfect cognitive levels. Humans can combine their partial cognition, such as cognition achieved through AI, into a whole with themselves. Then, leaving the unknown parts to nature to answer, the whole composed of humans and machines can not only simulate the workings of the universe more accurately but also promote a deeper understanding and simulation of the universe's operations, achieving a closer to perfect exploration of universal truths by humanity.

Follow the author to learn more about AI and consciousness knowledge.

References

https://mp.weixin.qq.com/s/bZlhzIuscWyQEB_2nLr1Ag

https://www.science.org/doi/10.1126/science.aan8871

https://www.nature.com/articles/s41583-022-00587-4

https://www.pnas.org/doi/10.1073/pnas.2115934119

https://arxiv.org/pdf/1803.10122.pdf