In spite of the remarkable achievements of ChatGPT and similar large language models, there are growing concerns about the trajectory of the underlying artificial neural networks (ANNs).
There are two prominent issues with artificial neural networks (ANNs), as highlighted by Cornelia Fermüller, a computer scientist at the University of Maryland. Firstly, ANNs exhibit a remarkably high power consumption, which poses sustainability concerns. Secondly, ANNs lack transparency, making it arduous to unravel their inner workings and comprehend the reasons for their remarkable performance. Consequently, leveraging analogy-based reasoning, a fundamental human cognitive ability involving symbolic representations of objects, concepts, and their associations, becomes an elusive goal within ANNs.
The limitations observed in ANNs can be attributed to the underlying architecture and constituent elements, namely individual artificial neurons. These neurons function by receiving inputs, conducting computations, and generating outputs. Contemporary ANNs are intricate systems comprising interconnected computational units that are trained to perform specific tasks.
The limitations of ANNs have long been apparent, as exemplified by the challenge of differentiating circles from squares. To address this, an ANN may employ two output neurons—one dedicated to identifying circles and the other to recognizing squares. However, if the objective involves incorporating color information, such as distinguishing between blue and red shapes, the complexity escalates, necessitating the use of four output neurons: one for each combination of color and shape (blue circle, blue square, red circle, red square). As the number of distinct features grows, the requirement for an augmented number of neurons becomes inevitable.
This approach is not reflective of how our brains perceive the natural world, considering its rich and diverse array of variations. Adopting such a perspective would imply that our brains are equipped with individual neurons dedicated to detecting every possible combination, leading to notions like a specific neuron responsible for identifying a purple Volkswagen," explained Bruno Olshausen, a neuroscientist at the University of California, Berkeley.
Olshausen and his colleagues present a contrasting perspective, suggesting that the brain's information representation is mediated by the combined activity of multiple neurons. Thus, the perception of a purple Volkswagen is not encoded solely by the behavior of a single neuron, but rather by the coordinated firing patterns exhibited by thousands of neurons. Notably, these same sets of neurons, engaging in different firing configurations, have the potential to represent entirely different concepts, such as that of a pink Cadillac.
This premise sets the stage for an entirely novel computational paradigm known as hyperdimensional computing. Crucially, this approach revolves around the representation of discrete fragments of information, be it the concept of a car, its distinctive characteristics encompassing make, model, or color, or the comprehensive integration of these attributes, as a unified construct termed a hyperdimensional vector.
In essence, a vector denotes a structured sequence of numerical values. For instance, a three-dimensional vector encompasses three numerical components: the x, y, and z coordinates that delineate a point within a three-dimensional realm. Conversely, a hyperdimensional vector, often referred to as a hypervector, encompasses an array of numbers, perhaps even spanning 10,000, symbolizing a point situated within a vast 10,000-dimensional space. Leveraging these mathematical constructs and the associated algebraic operations that govern them bestows upon us a remarkably versatile and potent toolset, capable of transcending certain limitations within contemporary computing and paving the way for a novel paradigm in artificial intelligence.
Olshausen expressed profound enthusiasm for this development, asserting that it represents the most exciting advancement he has encountered throughout his extensive career. In his view, as well as that of numerous experts, hyperdimensional computing heralds a transformative era characterized by computational efficiency, resilience, and unparalleled transparency in machine-driven decision-making.
Enter High-Dimensional Spaces
To comprehend the role of hypervectors in enabling computing capabilities, let us revisit the scenario involving images containing red circles and blue squares. In this context, we necessitate the utilization of vectors to represent the fundamental variables, namely SHAPE and COLOR. Furthermore, vectors are required to encapsulate the distinct values that can be assigned to these variables, encompassing CIRCLE, SQUARE, BLUE, and RED.
Each vector must exhibit distinctiveness, a characteristic that can be quantified through the property of orthogonality, denoting the perpendicular relationship between vectors. In three-dimensional (3D) space, three vectors achieve orthogonality by aligning with the x, y, and z axes, respectively. Extending this concept to a 10,000-dimensional space, it becomes evident that there exist 10,000 mutually orthogonal vectors.
When we introduce the possibility of vectors being approximately orthogonal, the quantity of distinct vectors in a high-dimensional space expands exponentially. In the context of a 10,000-dimensional space, the number of nearly orthogonal vectors reaches millions.
To represent SHAPE, COLOR, CIRCLE, SQUARE, BLUE, and RED as distinct vectors, we can take advantage of the abundance of nearly orthogonal vectors in a high-dimensional space. Assigning six random vectors to represent these six items is a practical approach, as the likelihood of them being nearly orthogonal is extremely high. In a seminal paper from 2009, Pentti Kanerva, a researcher at the Redwood Center for Theoretical Neuroscience at the University of California, Berkeley, emphasized the ease of generating nearly orthogonal vectors as a key advantage of hyperdimensional representation.
The aforementioned paper represents an evolution of research initiated in the mid-1990s by Kanerva and Tony Plate, who was pursuing his doctorate under the guidance of Geoff Hinton at the University of Toronto. Independently, Kanerva and Plate laid the foundation for the algebraic framework governing hypervector manipulation, thereby providing initial insights into its potential for high-dimensional computing.
The computational system devised by Kanerva and Plate provides a comprehensive framework for manipulating the hypervectors associated with shapes and colors. Through the application of specific mathematical operations, these hypervectors can be transformed, reflecting the symbolic manipulation of corresponding conceptual representations.
Within the hyperdimensional computing framework, the first fundamental operation is multiplication, which allows for the fusion of concepts. By multiplying the hypervector representing SHAPE with the hypervector representing CIRCLE, a new composite vector emerges, symbolizing the concept of 'SHAPE is CIRCLE.' Notably, this composite vector possesses nearly orthogonal properties to both SHAPE and CIRCLE, while still retaining the recoverable individual components. This feature proves essential when it comes to extracting specific information from bound vectors. For instance, by employing an appropriate unbinding process, the hypervector representing the color PURPLE can be retrieved from a bound vector that represents a Volkswagen.
Within the framework of hyperdimensional computing, the second essential operation is addition, which facilitates the generation of a new vector known as a superposition, representing a combination of concepts. For instance, by adding together two bound vectors, namely 'SHAPE is CIRCLE' and 'COLOR is RED,' a composite vector emerges, symbolizing a circular shape that is red in color. Similarly, the superposed vector can be decomposed, allowing for the retrieval of its constituent components.
The third crucial operation in hyperdimensional computing is permutation, which involves reorganizing the individual elements within vectors. As an illustration, suppose you have a three-dimensional vector with labeled values x, y, and z. Through permutation, you can rearrange the values, shifting x to y, y to z, and z to x. According to Kanerva, this capability of permutation enables the construction of structure and facilitates the handling of sequential phenomena. To demonstrate, consider two events represented by hypervectors A and B. While superposing them into a single vector would obliterate the order of events, combining addition with permutation preserves the temporal sequence. By reversing the operations, the events can be retrieved in their original order.
Collectively, these three operations proved to be sufficient in establishing a formal algebra of hypervectors, which in turn facilitated symbolic reasoning. However, despite the transformative potential of hyperdimensional computing, many researchers, including Olshausen, initially struggled to fully comprehend its implications. As Olshausen admitted, the significance of this paradigm didn't immediately resonate with him, stating, 'It just didn't sink in.
Harnessing the Power
In 2015, Eric Weiss, a student under Olshausen's guidance, showcased a remarkable facet of hyperdimensional computing's distinct capabilities. Weiss successfully devised a method to encapsulate a complex image within a singular hyperdimensional vector, encompassing comprehensive information about all the objects present, including their diverse attributes such as colors, positions, and sizes.
Olshausen vividly recalls the transformative moment, exclaiming, 'I was practically flabbergasted! It was as if a sudden burst of illumination flooded my mind.' Such was his reaction when Eric Weiss presented his findings, prompting Olshausen to exclaim, 'I practically fell out of my chair!'
Following this breakthrough, an increasing number of research teams delved into the development of hyperdimensional algorithms aimed at emulating fundamental tasks previously undertaken by deep neural networks approximately two decades earlier, including image classification.
Let us examine an annotated dataset comprising images depicting handwritten digits. Through the utilization of a predefined scheme, an algorithm assesses the distinctive characteristics of each image. Subsequently, the algorithm generates a hypervector corresponding to each image. The algorithm further combines the hypervectors of all zero images, thereby generating a hypervector that represents the concept of zero. This process is then repeated for all digits, resulting in the creation of ten distinct 'class' hypervectors, one for each digit.
Subsequently, when presented with an unlabeled image, the algorithm proceeds to generate a hypervector that represents this new image. The hypervector is then compared against the previously stored class hypervectors. Through this comparison process, the algorithm identifies the digit to which the new image bears the closest resemblance.
However, this is merely the initial stage. The true potential of hyperdimensional computing lies in its capacity to compose and decompose hypervectors for the purpose of reasoning. The most recent exemplification of this capability emerged in March when Abbas Rahimi and his colleagues at IBM Research in Zurich employed hyperdimensional computing in conjunction with neural networks to successfully address a longstanding problem in abstract visual reasoning—an endeavor that proves challenging for conventional ANNs as well as some individuals. Referred to as Raven's progressive matrices, this problem presents a 3-by-3 grid of geometric object images, with one position left blank. The task at hand requires the subject to select, from a set of candidate images, the one that best completes the blank position.
Recognizing the significance of the problem in visual abstract reasoning, Abbas Rahimi expressed his team's strong conviction, stating, 'This represents the pinnacle example in the realm of visual abstract reasoning, and we were eager to dive right in.'
In order to employ hyperdimensional computing for solving the problem, the research team embarked on the task by initially constructing a comprehensive dictionary of hypervectors to represent the various objects depicted in each image. Each hypervector in the dictionary encapsulated the characteristics and attributes associated with a specific object. Subsequently, a neural network was trained to analyze the image and generate a bipolar hypervector—a binary element that could assume values of +1 or -1—that closely approximated a superposition of hypervectors from the dictionary. As a result, the generated hypervector encapsulated valuable information regarding all the objects and their respective attributes within the image. Rahimi elucidated, 'The neural network is guided towards a meaningful conceptual space.'
Once the network has produced hypervectors representing the context images as well as the candidate images for the vacant slot, an additional algorithm is employed to examine these hypervectors and generate probability distributions concerning diverse image characteristics. These characteristics encompass factors such as the number of objects, their sizes, and other pertinent attributes present within each image. These probability distributions, which capture the likely features of both the context and candidate images, can subsequently be translated into hypervectors. Employing algebraic operations, these hypervectors facilitate the prediction of the most probable candidate image that best complements the context.
The team's approach demonstrated outstanding performance with an accuracy rate of nearly 88 percent when evaluated against a specific problem set. In comparison, solutions relying solely on neural networks exhibited a significantly lower accuracy of less than 61 percent. Notably, the team also showcased the substantial computational efficiency advantages of their system over a traditional method grounded in symbolic logic rules. Specifically, their system exhibited a remarkable speed improvement of almost 250 times when compared to the conventional approach, which entails laborious searches within an extensive rulebook to deduce the correct subsequent action, particularly in the context of 3-by-3 grids.
Hyperdimensional: A Promising Start
Hyperdimensional computing not only empowers us to solve problems in a symbolic manner but also offers solutions to inherent challenges present in traditional computing paradigms. Contemporary computers are susceptible to rapid performance degradation when confronted with errors arising from random bit flips, whereby a 0 might erroneously become a 1 or vice versa. In such cases, the reliance on error-correcting mechanisms built into the system becomes crucial. However, these mechanisms themselves impose a performance penalty that can reach up to 25 percent, as highlighted by Xun Jiao, a computer scientist affiliated with Villanova University.
Hyperdimensional computing boasts remarkable error tolerance, wherein a hypervector remains close to its original state even when subjected to significant random bit flips. This characteristic ensures that the integrity of reasoning processes using these vectors remains largely intact despite the presence of errors. Noteworthy findings from Jiao and his team's research indicate that these systems exhibit fault tolerance levels at least 10 times higher than those observed in traditional artificial neural networks (ANNs). It is worth noting that ANNs already demonstrate orders of magnitude greater resilience compared to conventional computing architectures. Jiao emphasizes the opportunity to harness this exceptional resilience in the development of efficient hardware designs.
Transparency stands as another notable advantage of hyperdimensional computing, as the underlying algebra provides clear insights into the reasoning process leading to the system's chosen answer. In contrast, traditional neural networks lack this level of transparency. Recognizing this, Olshausen, Rahimi, and other researchers are actively developing hybrid systems that combine neural networks' ability to map real-world entities to hypervectors, followed by the utilization of hyperdimensional algebra. This approach facilitates tasks such as analogical reasoning, which becomes inherently accessible. Olshausen aptly notes, 'This level of understandability should be expected of any AI system. It should be comprehensible, much like an airplane or a television set.'
The numerous advantages of hyperdimensional computing over traditional computing methodologies suggest its suitability for a new generation of robust and energy-efficient hardware. Its compatibility with in-memory computing systems, wherein computation is performed on the same hardware that stores data, distinguishes it from existing von Neumann architectures that often suffer from inefficiencies in data transfer between memory and the central processing unit. Moreover, hyperdimensional computing lends itself well to the utilization of analog devices that operate at low voltages, enabling remarkable energy efficiency. However, these analog devices are susceptible to random noise. Traditional von Neumann computing encounters a substantial limitation when faced with such randomness, acting as a barrier that cannot be surpassed. In contrast, hyperdimensional computing offers a breakthrough, allowing one to transcend this limitation effortlessly.
Although hyperdimensional computing exhibits significant advantages, it remains in its nascent stage. Fermüller highlights the genuine potential of this approach but emphasizes the necessity of subjecting it to real-world challenges and scaling it up to dimensions comparable to modern neural networks.
"In addressing large-scale problems, Rahimi emphasizes the critical requirement for highly efficient hardware. As an illustration, he raises the question of how to conduct efficient searches across a billion items."
Kanerva anticipates that with time, further revelations will unfold regarding the untapped potential of high-dimensional spaces. He emphasizes that "there are other undisclosed insights concealed within these realms," and considers the current state of computing with vectors as merely the inception of a new era.
Labels: artificial intelligence, artificial neural networks, ChatGPT, High-Dimensional Spaces, Hyperdimensional Computing, hypervectors, neurons