Qualitative Analysis of Dynamic Activity Patterns in Neural Networks
View this Special IssueResearch Article  Open Access
Sylvain Chartier, Mounir Boukadoum, "Encoding Static and Temporal Patterns with a Bidirectional Heteroassociative Memory", Journal of Applied Mathematics, vol. 2011, Article ID 301204, 34 pages, 2011. https://doi.org/10.1155/2011/301204
Encoding Static and Temporal Patterns with a Bidirectional Heteroassociative Memory
Abstract
Braininspired, artificial neural network approach offers the ability to develop attractors for each pattern if feedback connections are allowed. It also exhibits great stability and adaptability with regards to noise and pattern degradation and can perform generalization tasks. In particular, the Bidirectional Associative Memory (BAM) model has shown great promise for pattern recognition for its capacity to be trained using a supervised or unsupervised scheme. This paper describes such a BAM, one that can encode patterns of real and binary values, perform multistep pattern recognition of variablesize time series and accomplish manytoone associations. Moreover, it will be shown that the BAM can be generalized to multiple associative memories, and that it can be used to store associations from multiple sources as well. The various behaviors are the result of only topological rearrangements, and the same learning and transmission functions are kept constant throughout the models. Therefore, a consistent architecture is used for different tasks, thereby increasing its practical appeal and modeling importance. Simulations show the BAM's various capacities, by using several types of encoding and recall situations.
1. Introduction
Being able to recognize and recall patterns of various natures and in different contexts is something that human beings accomplish routinely and with little effort. But these tasks are difficult to reproduce by artificial intelligent systems. A successful approach has consisted of distributing information over parallel networks of processing units, as done in biological neural networks. This braininspired, artificial neural network approach offers the ability to develop attractors for each pattern if feedback connections are allowed (e.g., [1, 2]). It also exhibits great stability and adaptability with regard to noise and pattern degradation, and can perform generalization tasks. In particular, the Bidirectional Associative Memory (BAM) model has shown great promise for pattern recognition for its capacity to be trained using a supervised or unsupervised scheme [3]. Given bipolar colonmatrix pairs, , BAM learning is accomplished with a simple Hebbian rule, according to the equation: In this expression, and are matrices that represent the sets of bipolar pairs to be associated, and is the weight matrix. To assure perfect storage and retrieval, the input patterns needed to be orthogonal (, with typically the identity matrix; [4]). In that case, all the positive eigenvalues of the weight matrix will be equal and, when an input is presented for recall, the correct output can be retrieved. For example, if the input represents the encoded pattern , then the output will be given by The output will thus correspond to the encoded stimulus. Equation (1.1) uses a oneshot learning process since Hebbian association is strictly additive. A more natural learning procedure would make the learning incremental but, then, the weight matrix would grow unbounded with the repetition of the input stimuli during learning. In addition, the eigenvalues will reflect the frequency of presentation of each stimulus. This property may be acceptable for orthogonal patterns, but it leads to disastrous results when the patterns are correlated. In that case, the weight matrix will be dominated by its first eigenvalue, and this will result in recalling the same pattern whatever the input. For correlated patterns, a compromise is to use a oneshot learning rule to limit the domination of the first eigenvalue, and to use a recurrent nonlinear transmission function to allow the network to filter out the different patterns during recall. Kosko’s BAM effectively used a signum transmission function to recall noisy patterns, despite the fact that the weight matrix developed by using (1.1) is not optimal. The nonlinear transmission function usually used by the BAM network is expressed by the following equations: where is the signum function and is a given discrete time step. Initially, BAMs had poor storage capacity, could only store binary patterns, and were limited to static inputs. Nowadays, storage and recall performance have much improved; BAMs can learn and recall binary patterns with good performance (e.g., [5–20]), and that capability has been extended to realvalued patterns (e.g., [7, 8, 21–24]). Attention has also been given to learning and retrieving temporal sequences (see [25] for a review). These multistep associations have been studied essentially with gradient descent techniques (e.g., [26, 27]), competitive techniques (e.g., [28]) or complex architectures (e.g., [29, 30]). Few authors have studied this type of association in the case of simple BAM networks (e.g., [7, 31–33]).
In this paper, focus will be drawn on BAM properties for different kinds of associations. At first analysis of the transmission function will be studied both analytically and numerically. It will be followed by simulations of the classic case of onetoone association. The next section will present manytoone association that allows mean extraction from the presentation of different exemplars. Then, we focus on temporal association. In this case, both limit cycle and steadystate behaviors will be presented. Finally, more complex architectures will be explored by using the Multidirectional Associative Memory (MAM) architecture while keeping the same learning and transmission functions.
2. Analysis of the Transmission Function
This section describes how the transmission function is derived from both analytical and numerical results of the onedimensional cubic function.
2.1. OneDimensional Symmetric Setting
The transmission function used in the model is based on the classic Verhulst equation [34]. Since, this quadratic function has only one stable fixedpoint, it is extended to a cubic form described by the dynamic equation where represents the input value, and is a free parameter that affects the equilibrium states of (2.1). Figure 1 illustrates its shape for . The fixedpoints are the roots of . For example, if the corresponding fixedpoints are , 0, and 1. The stability properties of the fixedpoints are determined by computing the derivative of (2.1): If the derivative at a given fixedpoint is greater than zero, then a small perturbation results in growth; else, if the derivative is negative, then a small perturbation results in decay. The first situation represents an unstable fixedpoint whereas the second represents a stable one. In the previous example, both and are stable fixedpoints while is an unstable fixedpoint. This is illustrated in Figure 1 by filled and empty circles, respectively.
Another way to visualize the dynamics of a recurrent neural network is based on the physical idea of energy [1]. The energy function, noted , can be defined by where the negative sign indicates that the state vector moves downhill in the energy landscape. Using the chain rule, it follows from (2.3) that Thus, decreases along trajectories or, in other words, the state vector globally converges towards lower energy states. Equilibrium occurs at locations of the vector field where local minima correspond to stable fixedpoints, and local maxima correspond to unstable ones. To find them, we need to find such that The general solution is where is an arbitrary constant (we use for convenience). Figure 2 illustrates the energy function when . The system exhibits a doublewell potential with two stable equilibrium points ( and ).
The parameter plays an important role in determining the number and positions of the fixedpoints. Figure 3 illustrates the situation. For less than zero, the system has only one (stable) fixedpoint, ; for greater than zero, there are three fixedpoints: and , of which the first one is unstable and the other two are stable. Finally, for equal to zero, we have a pitchfork bifurcation point. We deduce from the preceding that we must have for the system to store binary stimuli.
2.2. MDimensional Symmetric Setting
In a dimensional space, (2.1) takes a vector form and becomes where, , and and is the network’s dimension. When the weight matrix is diagonal, the system is uncoupled and the analysis is straightforward. As in the onedimensional system, the fixedpoints are defined by the roots of (2.7). Their stability properties of each one are determined by finding the eigenvalues of its Jacobian matrix. Depending on the eigenvalues, different types of fixedpoint behaviors are obtained. For example, if there will be nine fixedpoints as illustrated in Figure 4. Four of them are stable: , , , and ; they are indicated by filled circles. The other five are unstable and indicated by empty circles. Here also, the stability of the fixedpoint can be determined from the energy of the system. In matrix notation, the energy function is In the case of bipolar pattern and if is set to , then the energy of the system will be equal to .
The function is plotted in Figure 5 for a twodimensional system, using the previous values of . The stable fixedpoints correspond to the local minima in the plot and they partition the recall space is into four equal wells. For example, if the desired correct output (fixedpoint) is , the probability that this fixedpoint attracts a uniformly distributed pattern whose elements is 25%.
2.3. Numerical Approximation Using Euler’s Method
From the previous analysis, two stable fixedpoints exist when is positive. For simplicity, if all the element of are set to one, then (2.7) becomes This last differential equation can then be approximated in discrete time by using the Euler method: where is small positive constant and is a given discrete time step. Rearranging the terms yields However, as it is, (2.11) does not reflect the impact of weight connections. To take into account that the connection strength between units is modified by a learning rule, we pose , where represents the weight connections. Equation (2.11) then becomes This last function is illustrated in Figure 6. As the figure shows, , then will have the same value of 1.
The slope of the derivative of the transmission function will determine its stability. In our case, it is desired that the fixedpoints have a stable monotonic approach. Therefore, the slope must be positive and less than one [35]: This condition is satisfied when for bipolar stimuli. In that case, . Another way to analyze the behavior of the network for the whole range of is by performing a Lyapunov Analysis. This analysis will allow us to discriminate between aperiodic (chaos) attractor from periodic one. The Lyapunov exponent for the case of a onedimensional network is approximated by where is the number of network iterations, set to 10,000 to establish the approximation. Again, for simplicity, we perform the analysis in the case of independent units. In other words, the derivative term is obtained from (2.13) when (for simplicity), so that is given by In order to estimate the range of values for a given period, a bifurcation diagram can be performed. Figure 7 illustrates both Lyapunov and bifurcation analysis and shows that for values of less than 1.0, the network converges to a fixedpoint attractor. However, at higher values (e.g., 1.6), the network may converge to an aperiodic attractor.
(a)
(b)
2.4. Numerical Approximation Using ForthOrder Runge Kutta’s Method
Another numerical approximation of the network’s dynamics is by using the ForthOrder Runge Kutta (FORK) method. Contrarily to Euler’s method, FORK uses an average estimation to approximate the next time step where is a small approximation parameter. Again, to take into account weight connections, we pose that . Equation (2.16) thus becomes This last function is illustrated in Figure 8. As the figure shows, if , then will have the same value of 1. Again, like any nonlinear dynamic system, to guarantee that a given output converges to a fixedpoint , the slope of the derivative of the transmission function must be positive and less than one [35]: This condition is satisfied when for bipolar stimuli. In that case, . Here also, bifurcation and Lyapunov exponent diagrams were performed. Figure 8 shows that if the value of is lower than 1.7, the network will converge to a fixedpoint. However, if the value is too high then a given input will converge to an aperiodic attractor.
(a)
(b)
2.5. Performance Comparison between Euler and FORK Approximations
The purpose of this simulation was to compare the performance between Euler and FORK approximations. Although FORK gives a more precise approximation of ordinary differential equation (2.7), we need to evaluate if a better approximation translates into a better radius of attraction.
2.5.1. Methodology
A low and a high memory load was used as a basis of comparison. For the low memory load situation, the first 10 correlated patterns were associated together . The patterns are pixels images of alphabetic characters where a white pixel is given the value −1 and a black pixel the value 1. This led to a memory load equal to 20% (10/49) of the 49dimensional space capacity. Normally, such a high load value cannot be handled by Kosko’s BAM which is about 14%. For the high memory load situation, the associations were extended to all the 26 correlated patterns . This last situation led to a memory load equal to 53% (26/49). Usually, around 50% is about the maximum load an optimized Hebbiantype associative memory can stored without major performance decline [36]. Figure 9 illustrates the stimuli used for the simulation. The images were converted to vectors of 49 dimensions before being input to the network.
Both methods were tested with their output parameter set to 0.05 and 0.025. These values are much lower than the theoretical limit that was found analytically. The simulation results (not reported) show that a higher value makes the model unable to associate the patterns. The associations between patterns were accomplish using the learning rule (3.4) and (3.5) that will be described in the next section.
After associating the desired pattern pairs, the radiuses of attraction obtained by Euler and FORK were evaluated for noisy recall tasks. The first task consisted of recalling noisy patterns obtained by generating random normally distributed vectors that were added to the patterns. Each noisy pattern vector was distributed with a mean of 0 and a standard deviation of (where represents the desired proportion of noise). For the simulation, varied from 0.1 to 1.0. The second task was to recall the correct associated stimulus from a noisy input obtained by randomly flipping pixels in the input pattern. The number of pixel flips varied from 0 to 10, thus corresponding to a noise proportion of 0 to 20%.
2.5.2. Results
Figure 10 illustrates the various performances. If the memory load is low (10 patterns over 49 pixels) as shown in the left column, then Euler's approximation has a slight advantage over FORK for random pixel flipping. This advantage is increased if the memory load is high (26 patterns over 49 pixels) as shown in the right column. Moreover, the higher the value of the output parameter, the higher the performance will be. However, if the output parameter is set to high, then the network might not converged to a fixedpoint. Therefore, the choice for an output transmission will be based on Euler methods.
(a)
(b)
(c)
(d)
3. Model Description
3.1. Architecture
The proposed BAM architecture is similar to the one proposed by Hassoun [37] as illustrated in Figure 11, where and represent the initial inputstates (stimuli); is the number of iterations over the network; and and are weight matrices. The network is composed of two Hopfieldlike neural networks interconnected in headtotoe fashion. These interconnected layers allow a recurrent flow of information that is processed in a bidirectional way. The ylayer returns information to the layer and vice versa. If similar patterns are presented at both layers, then the network act like an autoassociative memory and if the patterns at the layer are associated with different ones in the layer, then the network acts like a heteroassociative memory [3]. As a result, it encompasses both unsupervised (selfsupervised) and supervised learning. In this particular model, the two layers can be of different dimensions and contrary to usual BAM designs, the weight matrix from one side is not necessarily the transpose of that from the other side.
3.2. Transmission Function
Based, on the numerical results of the Section 2.5.2., the transmission function is expressed by the following equations. The input activations to each of the  and layers are computed as follows: The outputs are then obtained using Euler's approximation (2.12) defined by the following equations: where and are the number of units in each layer, is the index of the respective vector element, and represent the layers’ contents at time , and is a general output parameter. The shape of this function is illustrated in Figure 6. This function has the advantage of exhibiting continuousvalued (or graylevel) attractor behavior (for a detailed example, see [7]) when used in a recurrent network. Such properties contrast with standard nonlinear transmission functions, which only exhibit bipolar attractor behavior (e.g., [3]).
3.3. Learning
The network tries to solve the following nonlinear constraints: where is the transmission function defined previously (3.2). The form of the constraints and the recurrent nature of the underlying network call for a learning process that is executed online: the weights are modified in function of the input and the obtained output. In addition, since incremental learning is favored, the network must then be able to selfconverge. As a result, the learning is based on the time difference Hebbian association [7, 38–40]. It is formally expressed by the following equations: where represents the learning parameter. In (3.4) and (3.5), the weight updates follow this general procedure: first, initial inputs and are fed to the network, then, those inputs are iterated t times through the network (Figure 1). This results in outputs and that are used for the weight updates. Therefore, the weights will selfstabilize when the feedback is the same as the initial inputs ( and ; in other words, when the network has developed fixedpoints. This contrasts with most BAMs, where the learning is performed solely on the activation (offline). Learning convergence is a function of the value of the learning parameter . For simplicity, we assumed that both input and output are the same . Therefore, to find the maximum value that the learning parameter can be set to we need to find the derivation of learning equation when the slope is positive and then solve it for [35] If and in the case of a onedimensional pattern in a onedimensional network, the situation simplifies to In this case, the network will converges when and the solution will be Just as any network, as the network increases in dimensionality, must be set at lower values. Therefore, in the case of BAM of and dimensions, the learning parameter must be set according to
4. Simulations
The following simulations will first provide a numerical example to illustrate how the learning and recall are performed using a small network. The next simulation then reproduces a classic BAM onetoone association’s task. Finally, the third simulation extends the association property of the model to manytoone association’s task.
4.1. Numerical Example
The first simulation uses a toy example to show in details how the learning and recall processes are performed.
4.1.1. Learning
For this simulation the transmission function parameter () was set to 0.1 and the learning parameter () was set to 0.01. Also, to limit the simulation time, the number of output iterations before each weight matrix update was set to for all simulations. Learning was carried out according to the following procedure:
Weight initialization at zero;(1)random selection of a pattern pair,(2)computation of and according to the transmission function (3.2),(3)computation of the weight matrix ( and ) update according to (3.4) and (3.5),(4)repetition of steps (1) to (3) until the weight matrices converge.
The stimuli pairs for the simulation are given by
First Learning Trial
Assume that the second pair is randomly selected. Then,
and (3.2) yield zerovectors for and since the weight connections are initially set at zero:
Using (3.4) and (3.5), the weight matrices are updated and their values are
Second Learning Trial
Assume that the first pair is randomly selected. Therefore,
Using (3.2) we compute y(1) and x(1) to get
Using (3.4) and (3.5), the weight matrices are updated and their values are now
Notice that the weight matrices are not the transposed of each other like Kosko’s [3].
Learning Trial
If the process is iterated over and over, (e.g., 100 learning trials) the weight matrices converge to
4.1.2. Recall
To test if the network can effectively recall a given pattern, a pattern is selected and it is iterated through the network until stabilization. For example, if the following pattern is presented to the network (noisy version of pattern 2) the different statevectors will be: The output of the second layer will give Then, this output is sent back to the first layer: The cycle will be repeated until convergence Therefore, the new input is classified as part of the second pattern pair. The next section presents a classic example of onetoone association.
4.2. OnetoOne Association
The task of the network consists of associating a given picture with a name. Therefore, the network should output from a picture the corresponding name, and from the name the corresponding picture.
4.2.1. Methodology
The first stimulus set represents 8bit greylevel pictures. Each image has a dimension of and therefore forms a 912dimensional realvalued vector. They are reduced size versions of the California Facial Expressions set (CAFE, [41]). Each pixel was normalized to values between −1 and 1. The second set consists of 4letter words on a grid that identify each picture. For each name a white pixel is assigned a value of −1 and a black pixel a value of +1. Each name forms a 217dimensional bipolar vector. Therefore, the weight matrix has connections and the weight matrix connections. The network task was to associate each image with its corresponding name as depicted in Figure 12. The learning parameter () was set to 0.0025 and the output parameter() to 0.1. Both values met the requirement for weight convergence and fixedpoint development. Since the model’s learning is online and iterative, the stimuli were not presented all at once. In order to save computational time, the number of iterations before each weight update was set to . The learning followed the same general procedure described in the previous simulation:(1)initialization of weights to zero,(2)random selection of a pair following a uniform distribution,(3)stimuli iteration through the network according to Equations (3.1) and (3.2) (one cycle),(4)weight update according to Equations (3.4) and (3.5),(5)repetition of 2 to 4 until the desired number of learning trials is reached ().
(a)
(b)
4.2.2. Results
It took about 12 000 learning trial before the learning converged. The network was able to perfectly associate a name with the corresponding picture, and vice versa. Table 1 to Table 4 illustrates some examples of the noise tolerance and pattern completion properties of the network. More precisely, Table 1 shows how the network was able to recall the proper name under a noisy input. The input consisted of the first picture contaminated with a noise level of 22%. In other words, 200 pixels (out of 912) were randomly selected and their values were multiplied by −1.

In addition, by the bidirectional nature of the network, it is also possible to not only recall the appropriate name but also to clean the noisy input. Table 2 shows an example of pattern completion. In this case the eye band consisted of pixels of zero value. The network was able to correctly output the name and restore the missing eyes. The network can also output a picture given an appropriate noisy name. For example, Table 3 shows that the network was able to recall the appropriate name even though the input name was incorrect (Kate instead of Katy). Finally, Table 4 shows that since “Pete” is the only name that begins with a “”, then only the first letter is necessary for the network to output the correct face. The general performance of the network has been evaluated by Chartier and Boukadoum [7]. The next section extends the simulations by introducing manytoone associations.



4.2.3. Transmission Function with Hard Limits
For some applications (e.g., [7]), It is desired that the transmission function lies within a fixed range of values. Saturating limits at −1 and 1 can be added to (3.2) and the transmission function is then expressed by the following equations: In contrast to a sigmoid transmission function, this function is not asymptotic at the ±1 values, and it still has the advantage of exhibiting continuousvalued (or graylevel) attractor behavior when used in a recurrent network. Figure 13 illustrates the shape of the transmission function when = 0.5.
We compared the same network with and without hard limits on the same task (Figure 9) previously described in Section 2.5.1. The performance of the network was evaluated with transmission function () values between 0.05 to 0.3; values outside this range gave worst results. In addition, the network was compared with the best results ( = 0.05) obtained from the simulation illustrated in Figure 10 when no hard limits was used. After the association of the desired pattern pairs, radius of attraction of the network was evaluated under noisy input obtained by randomly flipping pixels in the input pattern. The number of pixel flips varied from 0 to 10.
Figure 14 illustrated the various performances. The results show that under low to medium noise, the network will have the best performance (about 10% increases) if no hard limits are used. However, under mediumtohigh level of noise the situation is reverse; the network will have the best performance (about 5% increases) if hard limits are used ().
4.3. ManytoOne Association
This simulation illustrates manytoone association. The idea is to associate different emotions depicted by different peoples to the proper name tag. Therefore, upon the presentation of a given images, the network should output the corresponding emotion. The simulation is based on Tabari et al. [42].
4.3.1. Methodology
As in the previous section, the first stimulus set represents 8bits greylevel pictures. Each image has a dimension of and therefore forms a 912dimensional real value vector. They are reduced size version of the California Facial Expressions sample (CAFE, [41]). Each image pixels was rescaled to values between −1 and 1. For each of the 9 individuals 7 images reflect a given emotion (anger, disgust, fear, happy, maudlin, neutral, and surprised). The second set consists of letters place a grid that identify each emotions (A, D, F, H, M, N, and S). For each letter a white pixel is assigned a value of −1 and a black pixel a value of +1. Each name forms a 49dimensional bipolar vector. Therefore, the weight matrix has connections and the weight matrix connections. The network task was to associate each emotion, expressed by different person, with the corresponding letter (Figure 15). was set to 0.0025 and to 0.1. Both values met the requirement for weight convergence and fixedpoint behavior. The learning procedure followed the general procedure expressed previously.
4.3.2. Results
Once the learning trials were finished (about 1000 epochs), the network was able to correctly recall each emotion from the different people expressing it. As in onetoone association, the network was able to removed the noise and correctly recall the appropriate emotion tag. Table 5 shows two examples of noisy images that were contaminated by 200pixel flips (22%). Table 6 shows that the network recalled the appropriate emotion tag even in the absence of picture parts that are usually deemed important for identifying emotions, that is, the eyes and the mouth.


However, the most important property in this context is the fact that the two weight matrices are dynamically linked together. As was shown before, the network will output the corresponding letter given a face, but which face should the network output for a given letter? In this case, going from the letter layer to the picture layer implies onetomany association. Without an architecture modification that can allow contextual encoding [7], the network will not be able to output the correct pictures. Rather, the network will average all the people emotional expression. Therefore, as shown in Figure 16, the network is able to extract what features in the images make each emotion different.
5. Model Extension to Temporal Associative Memory
Until now, the association between the different stimuli is static. In some situations, the network can also perform multistep pattern recognition [43]. Such is the case when the output is a timeseries. To allow such encoding, the network’s architecture is be modified.
5.1. Architecture Modification
Figure 17 shows that, instead of having two heteroassociative networks connected together as in the Okajima et al. model [44], the network is composed of a heteroassociative and an autoassociative layer. The heteroassociative layer is used to map a given pattern to the next layer, whereas the autoassociative layer acts like a time delay circuit that feeds back the initial input to the heteroassociative layer. With this delay, it becomes possible for the overall network to learn temporal pattern sequences. Figure 17 illustrates the difference between a temporal associative memory and a bidirectional associative memory.
(a)
(b)
The initial value of the input pattern to the heteroassociative part is , and its output, , feeds the autoassociative part as . The latter yields an output, _{,} that is fed back into the heteroassociative part as well as into the autoassociative part. Thus, the autoassociative part serves as a context unit to the heteroassociative part. These two networks work in synergy to allow the necessary flow of information for, as our simulations will show, multistep pattern recall while still being able to filter noise out of the inputs.
5.2. Simplification of the Learning Function
In the case of autoassociative learning, the function expressed by (3.5) can be simplified by using the fact that the weight matrix is then square and symmetric. The symmetry property has the effect of canceling the cross terms in (3.5), and the autoassociative learning function becomes where represents the connections weight matrix. The learning rule is then a sum of a positive and a negative Hebbian term [7, 38].
5.3. Simulation 1: Limit Cycle Behavior
This simulation illustrates how the network can perform temporal association. The task consists of learning different planar rotation of the same object.
5.3.1. Methodology
Five different pattern sequences must be learned. Each sequence is composed of the same image with a 45 degrees planar rotation. Table 7 illustrates the five patterns sequences. The network has to associative each pattern of a sequence with the following one. In other words the network will associate the 45° with 90°, the 90° with 135°, the 135° with 180°, the 180° with the 225°, the 225° with 270°, the 270° with the 315°, the 315° with the 0°, and the 0° with the image. Therefore, the steadystate of the network should be a limit cycle of period 8. Each binary stimulus was placed on a grid, where white and black pixels were assigned −1 and 1 values, respectively. The free parameters were set to and , in accordance to the specification given in Chartier et al. [8].
5.3.2. Results
The network took about 2000 learning trials to converge and was able to correctly learn the 5 pattern sequences. It was also able to operate with noisy inputs and perform some generalization. For instance, Table 8 illustrates how the network was able to remove noise from the “fish” image as the temporal sequence was recalled. Table 9 shows that the network can generalize its learning to use similar objects. In this case a new “butterfly” picture was used as the initial input. As the stimulus iterates through the network, the network recalled the original “butterfly” depicted in Table 7 (the butterfly output at step 8 is different from the original one at step 0). Finally, Table 10 shows that the network can correctly output the image sequence under small planar rotation variations of the initial input image. In this particular example the new “dinosaur” picture represents a 275° rotation (270° + 5°). Although that particular rotation was not part of initial sequence set, the network was able to correctly recall the appropriate picture sequence.



5.4. Simulation 2: FixedPoint Behavior
Again, this simulation illustrates how the network can perform temporal association. The task is to learn different planar rotations of the same object. Contrarily to the previous simulation, once the network has been through every planar rotation it converges to a fixedpoint.
5.4.1. Methodology
Five different pattern sequences must be learned. Each sequence is composed of the same image with a 45° planar rotation. Table 11 illustrates the five patterns sequences. The network must associative each pattern of a sequence with the following one. In other words the network will associate the 45° with 90°, the 90° with 135°, the 135° with 180°, the 180° with the 225°, the 225° with 270°, the 270° with the 315°, the 315° with the 0° and the with the Image. This last association will creates a fixedpoint attractor that can then be used for other types of association in more complex architectures as will be shown in the next section. Each binary stimulus was placed on a grid, where white and black pixels were assigned a −1 and 1 value, respectively. The free parameters were set to and , in accordance to the specification given in [8].
5.4.2. Results
The network took about 2000 learning trials before convergence occurred, and it was able to correctly learn the 5 pattern sequences. The same examples for noise tolerance and generalization were used to compare the network performance between the limit cycle and the fixedpoint condition. Table 12 shows that the network can eliminate noise as the sequence is recalled. Table 13 shows learning generalization to other similar objects. After 9 time steps, the network recalled the “butterfly” that is depicted in Table 11. Finally, Table 14 shows that the network can also output the correct image sequence under small initial variations of planar rotation. In this particular example the “dinosaur” picture represent a 275° rotation as in the previous section.



6. Multidirectional Associative Memories
Sometimes association must be generalized to more than two directions. To deal with multiple associations, Hagiwara [45] proposed a Multidirectional Associative Memory [MAM]. The architecture was later extended to deal with temporal sequences [46]. This section shows some properties of a MAM that is composed of one temporal heteroassociative memory and one bidirectional associative memory. As in the previous simulations, only the network topology is modified as Figure 18 shows; the learning and transmission function remain the same. In this simulation the information received by the ylayer at time consist of and . The feedback sent by is useful only once the network reaches steadystate, the last pattern of the stimuli set. Therefore, to minimize its effect during the temporal recall, the feedback value of is lower than . This is formally expressed by where , and represent the weight connections linking and , respectively. Taken together, and form the weight matrix.
6.1. Methodology
The simulation performs the multistep pattern recognition described in the previous section in combination with a standard onetoone association. The onetoone association is performed only when the network is at a given fixedpoint. In other words the association will occur only when the output is at a 0° planar rotation for the recalled sequence (Table 11). The association is made between the given picture and the first letter of its name as illustrated in Figure 19. The learning and transmission function parameters were set to and . The zlayer feedback parameter was set to .
6.2. Results
Of course, if the temporal or bidirectional associative sections are independently activated, the resulting network performance will be the same as the previous sections. What matter in this case is the output state of the network . The best case is when a pattern is presented to the network that is closed to its fixedpoint behavior (315°). The effect of the feedback coming from layer will be limited. On the opposite, the worst case is when a pattern is presented with a 45° planar rotation. In this case, it will need 8 time steps before the output vector is at its fixedpoint behavior. From this point on, the picture identification process can occur. If the feedback coming from the zlayer is too strong, it could deviate the output vector’s trajectory. This could make the network converge to the wrong fixedpoint and lead to an incorrect name tag. Because of this, the feedback value of the zlayer must be low.
Table 15 shows an example of the “runner” given an initial 45° planar rotation. After one time step the network correctly output the 90° picture (Out 1). Of course, the name tag is incorrect (Out 2). Now the new input will correspond to the output of the layer (Out 10) and the feedback given by the zlayer (Out 2 not shown). The overall effect will be Out 1 + Out 2 and will be used as the new input. The process is repeated until convergence. After 9 time steps, Table 15 shows that the correct 0° “runner” picture is output as well as the correct name tag letter “”.

7. Conclusion
Several interesting properties of a BAM architecture have been shown. First, it was shown that a simple timedelay Hebbian learning can perform onetoone association and manytoone association with both binary and realvalued pattern. In addition, the BAM can be modified in such a way that both heteroassociation and autoassociation can be accomplished within the same architecture. In this case, the autoassociative part act like a time delay and the overall network can be used for temporal association. A simple case of onestep delay was shown. However, by adding l extra layers of autoassociation, the network can be easily modified to handle lstep delays. Finally, if a combination of temporal and bidirectional associations is grouped into a multidirectional associative memory more complex behaviors can be obtained.
In all cases, the same learning and transmission function are used. The only modification concerned the network topology. This property gives the network a high internal consistency. More complex architectures are possible, but a really powerful implementation improvement would be to develop an algorithm that guides the architecture growth based on the problem to be solved. Therefore, the architecture could be modified in function of the task to be performed and the desired behaviors, using BAMs as building blocks.
Acknowledgment
This research was supported in part by Natural Sciences and Engineering Research Council (NSERC) of Canada.
References
 J. J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities,” Proceedings of the National Academy of Sciences of the United States of America, vol. 79, no. 8, pp. 2554–2558, 1982. View at: Google Scholar
 J. A. Anderson, J. W. Silverstein, S. A. Ritz, and R. S. Jones, “Distinctive features, categorical perception, and probability learning: some applications of a neural model,” Psychological Review, vol. 84, no. 5, pp. 413–451, 1977. View at: Publisher Site  Google Scholar
 B. Kosko, “Bidirectional associative memories,” IEEE Transactions on Systems, Man and Cybernetics, vol. 18, no. 1, pp. 49–60, 1988. View at: Publisher Site  Google Scholar
 T. Kohonen, “Correlation matrix memories,” IEEE Transactions on Computers, vol. 21, no. 4, pp. 353–359, 1972. View at: Google Scholar
 M. E. AcevedoMosqueda, C. YáñezMárquez, and I. LópezYáñez, “AlphaBeta bidirectional associative memories: theory and applications,” Neural Processing Letters, vol. 26, no. 1, pp. 1–40, 2007. View at: Publisher Site  Google Scholar
 S. Arik, “Global asymptotic stability analysis of bidirectional associative memory neural networks with time delays,” IEEE Transactions on Neural Networks, vol. 16, no. 3, pp. 580–586, 2005. View at: Publisher Site  Google Scholar
 S. Chartier and M. Boukadoum, “A bidirectional heteroassociative memory for binary and greylevel patterns,” IEEE Transactions on Neural Networks, vol. 17, no. 2, pp. 385–396, 2006. View at: Publisher Site  Google Scholar
 S. Chartier, P. Renaud, and M. Boukadoum, “A nonlinear dynamic artificial neural network model of memory,” New Ideas in Psychology, vol. 26, no. 2, pp. 252–277, 2008. View at: Publisher Site  Google Scholar
 S. Du, Z. Chen, Z. Yuan, and X. Zhang, “Sensitivity to noise in bidirectional associative memory (BAM),” IEEE Transactions on Neural Networks, vol. 16, no. 4, pp. 887–898, 2005. View at: Publisher Site  Google Scholar
 T. D. Eom, C. Choi, and J. J. Lee, “Generalized asymmetrical bidirectional associative memory for multiple association,” Applied Mathematics and Computation, vol. 127, no. 23, pp. 221–233, 2002. View at: Publisher Site  Google Scholar
 H. Oh and S. C. Kothari, “Adaptation of the relaxation method for learning in bidirectional associative memory,” IEEE Transactions on Neural Networks, vol. 5, no. 4, pp. 576–583, 1994. View at: Publisher Site  Google Scholar
 C.S. Leung, “Optimum learning for bidirectional associative memory in the sense of capacity,” IEEE Transactions on Systems, Man and Cybernetics, vol. 24, no. 5, pp. 791–796, 1994. View at: Publisher Site  Google Scholar
 D. Shen and J. B. Cruz Jr., “Encoding strategy for maximum noise tolerance bidirectional associative memory,” IEEE Transactions on Neural Networks, vol. 16, no. 2, pp. 293–300, 2005. View at: Publisher Site  Google Scholar
 H. Shi, Y. Zhao, and X. Zhuang, “A general model for bidirectional associative memories,” IEEE Transactions on Systems, Man, and Cybernetics B, vol. 28, no. 4, pp. 511–519, 1998. View at: Google Scholar
 Y. F. Wang, J. B. Cruz Jr., and J. H. Mulligan Jr., “Two coding strategies for bidirectional associative memory,” IEEE Transactions on Neural Networks, vol. 1, no. 1, pp. 81–92, 1990. View at: Publisher Site  Google Scholar
 T. Wang, X. Zhuang, and X. Xing, “Weighted learning of bidirectional associative memories by global minimization,” IEEE Transactions on Neural Networks, vol. 3, no. 6, pp. 1010–1018, 1992. View at: Publisher Site  Google Scholar
 L. Wang and X. Zou, “Capacity of stable periodic solutions in discretetime bidirectional associative memory neural networks,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 51, no. 6, pp. 315–319, 2004. View at: Publisher Site  Google Scholar
 Y. Wu and D. A. Pados, “A feedforward bidirectional associative memory,” IEEE Transactions on Neural Networks, vol. 11, no. 4, pp. 859–866, 2000. View at: Google Scholar
 Z. B. Xu, Y. Leung, and X. W. He, “Asymmetric bidirectional associative memories,” IEEE Transactions on Systems, Man and Cybernetics, vol. 24, no. 10, pp. 1558–1564, 1994. View at: Publisher Site  Google Scholar
 X. Zhuang, Y. Huang, and S. Chen, “Better learning for bidirectional associative memory,” Neural Networks, vol. 6, no. 8, pp. 1131–1146, 1993. View at: Google Scholar
 G. Costantini, D. Casali, and R. Perfetti, “Neural associative memory storing graycoded grayscale images,” IEEE Transactions on Neural Networks, vol. 14, no. 3, pp. 703–707, 2003. View at: Publisher Site  Google Scholar
 C.C. Wang, S.M. Hwang, and J.P. Lee, “Capacity analysis of the asymptotically stable multivalued exponential bidirectional associative memory,” IEEE Transactions on Systems, Man, and Cybernetics B, vol. 26, no. 5, pp. 733–743, 1996. View at: Google Scholar
 D. Zhang and S. Chen, “A novel multivalued BAM model with improved errorcorrecting capability,” Journal of Electronics, vol. 20, no. 3, pp. 220–223, 2003. View at: Google Scholar
 J. M. Zurada, I. Cloete, and E. van der Poel, “Generalized Hopfield networks for associative memories with multivalued stable states,” Neurocomputing, vol. 13, no. 2–4, pp. 135–149, 1996. View at: Publisher Site  Google Scholar
 L. Wang, “Multiassociative neural networks and their applications to learning and retrieving complex spatiotemporal sequences,” IEEE Transactions on Systems, Man, and Cybernetics B, vol. 29, no. 1, pp. 73–82, 1999. View at: Google Scholar
 J. L. Elman, “Finding structure in time,” Cognitive Science, vol. 14, no. 2, pp. 179–211, 1990. View at: Google Scholar
 H. Wakuya and J. M. Zurada, “Bidirectional computing architecture for time series prediction,” Neural Networks, vol. 14, no. 9, pp. 1307–1321, 2001. View at: Publisher Site  Google Scholar
 G. Bradski, G. A. Carpenter, and S. Grossberg, “STORE working memory networks for storage and recall of arbitrary temporal sequences,” Biological Cybernetics, vol. 71, no. 6, pp. 469–480, 1994. View at: Publisher Site  Google Scholar
 B. B. Nasution and A. I. Khan, “A hierarchical graph neuron scheme for realtime pattern recognition,” IEEE Transactions on Neural Networks, vol. 19, no. 2, pp. 212–229, 2008. View at: Publisher Site  Google Scholar
 J. A. Starzyk and H. He, “Anticipationbased temporal sequences learning in hierarchical structure,” IEEE Transactions on Neural Networks, vol. 18, no. 2, pp. 344–358, 2007. View at: Publisher Site  Google Scholar
 D.L. Lee, “A discrete sequential bidierectional associative memory for multi step pattern recognition,” Pattern Recognition, vol. 19, pp. 1087–1102, 1988. View at: Google Scholar
 L. Wang, “Heteroassociations of spatiotemporal sequences with the bidirectional associative memory,” IEEE Transactions on Neural Networks, vol. 11, no. 6, pp. 1503–1505, 2000. View at: Google Scholar
 R. W. Zhou and C. Quek, “DCBAM: a discrete chainable bidirectional associative memory,” Pattern Recognition Letters, vol. 17, no. 9, pp. 985–999, 1996. View at: Publisher Site  Google Scholar
 A. A. Koronovskii, D. I. Trubetskov, and A. E. Khramov, “Population dynamics as a process obeying the nonlinear diffusion equation,” Doklady Earth Sciences, vol. 372, pp. 755–758, 2000. View at: Google Scholar
 D. Kaplan and L. Glass, Understanding Nonlinear Dynamics, Springer, New York, NY, USA, 1995.
 L. Personnaz, I. Guyon, and G. Dreyfus, “Information storage and retrieval in spinglass like neural networks,” Journal of Physics Letters, vol. 46, pp. L359–L365, 1985. View at: Google Scholar
 M. H. Hassoun, “Dynamic heteroassociative neural memories,” Neural Networks, vol. 2, no. 4, pp. 275–287, 1989. View at: Google Scholar
 S. Chartier and R. Proulx, “NDRAM: nonlinear dynamic recurrent associative memory for learning bipolar and nonbipolar correlated patterns,” IEEE Transactions on Neural Networks, vol. 16, no. 6, pp. 1393–1400, 2005. View at: Publisher Site  Google Scholar
 B. Kosko, “Unsupervised learning in noise,” IEEE Transactions on Neural Networks, vol. 1, no. 1, pp. 44–57, 1990. View at: Publisher Site  Google Scholar
 R. S. Sutton, “Learning to predict by the methods of temporal differences,” Machine Learning, vol. 3, no. 1, pp. 9–44, 1988. View at: Publisher Site  Google Scholar
 M. N. Dailey, G. W. Cottrel, and J. Reilly, HYPERLINK, 2001, http://www.cs.ucsd.edu/users/gary/CAFE/.
 K. Tabari, M. Boukadoum, S. Chartier, and H. Lounis, “Reconnaissance d'expressions faciales à l'aide d'une mémoire associative bidirectionnelle à fonction de sortie chaotique,” in Proceedings of Maghrebian Conference on Software Engineering and Artificial Intelligence, pp. 422–426, Agadir, Morocco, December 2006. View at: Google Scholar
 S. Chartier and M. Boukadoum, “A sequential dynamic heteroassociative memory for multistep pattern recognition and onetomany association,” IEEE Transactions on Neural Networks, vol. 17, no. 1, pp. 59–68, 2006. View at: Publisher Site  Google Scholar
 K. Okajima, S. Tanaka, and S. Fujiwara, “A heteroassociative memory network with feedback connection,” in Proceedings of the 1st International Joint Conference on Neural Networks, pp. H.711–H.7188, San Diego, Calif, USA, 1987. View at: Google Scholar
 M. Hagiwara, “Multidirectional associative memory,” in Proceeding of the International Joint Conference on Neural Networks, pp. 3–6, Washington, DC, USA, 1990. View at: Google Scholar
 A. F. R. Araujo and M. Vieira, “Temporal multidirectional associative memory: adaptable, continuous, and selfconnected MAM,” in Proceedings of the IEEE International Conference on Neural Networks, vol. 2, pp. 1194–1198, Houston, Tex, USA, June 1997. View at: Google Scholar
Copyright
Copyright © 2011 Sylvain Chartier and Mounir Boukadoum. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.