Table of contents

Volume 22

Number 12, June 1989

Previous issue Next issue

SPECIAL ISSUE IN MEMORY OF ELIZABETH GARDNER (1957-1988)

PAPERS

1959

The author presents; (i) basic issues; (ii) computing-by-dynamic-flow and computing-with-attractors; (iii) physics, biology and the neural code; (iv) realistic ab-initio models and idealized abstracted models.

1969

The size of the basin of attraction for randomly sparse neural networks with optimal interactions is calculated. For all values of the storage ratio, alpha =p/C<2, where p is the number of random uncorrelated patterns and C is the connectivity, the basin of attraction is finite, while for alpha <0.42, the basin of attraction is (almost) 100%.

1975

and

The authors give the expression for both integer and non-integer moments of the partition function Z of the random energy model. In the thermodynamic limit, they find that the probability distribution P(Z) can be decomposed into two parts. For log Z-(log Z) finite, the distribution is independent of N, the size of the system, whereas for log Z-(log Z) positive and of order N, the distribution is Gaussian. These two parts match in the region 1<<log Z-(log Z)<<N where the distribution is exponential.

1983

and

The optimal storage properties of three different neural network models are studied. For two of these models the architecture of the network is a perceptron with +or-J interactions, whereas for the third model the output can be an arbitrary function of the inputs. Analytic bounds and numerical estimates of the optimal capacities and of the minimal fraction of errors are obtained for the first two models. The third model can be solved exactly and the exact solution is compared to the bounds and to the results of numerical simulations used for the two other models.

1995

, and

The authors calculate the typical fraction of the phase space of interactions which solve the problem of storing a given set of p patterns represented as N-spin configurations, as a function of the storage ratio, alpha =p/N, of the stability parameter, kappa , and of the symmetry, eta , of the interaction matrices. The calculation is performed for strongly diluted networks, where the connectivity of each spin, C, is of the order of ln N. For each value of kappa and eta , there is a maximal value of alpha , above which the volume of solutions vanishes. For each value of kappa and alpha , there is a typical value of eta at which this volume is maximal. The analytical studies are supplemented by numerical simulations on fully connected and diluted networks, using specific learning algorithms.

2009

, and

The author considers a Hebbian learning mechanism, which gives rise to a change in synaptic efficacies only if the postsynaptic neuron is active. The model is solved analytically in the limit of strong dilution. The network is shown to classify initial configurations according to their mean activity and their overlap with one of the learnt patterns. The capacity of the network is calculated as a function of threshold.

2019

, and

Local iterative learning algorithms for the interactions between Ising spins in neural network models are discussed. They converge to solutions with basins of attraction whose shape is determined by the noise in the training data, provided such solutions exist. The training is applied both to the storage of random patterns and to a model for the storage of correlated words. The existence of correlations increases the storage capacity of a given network beyond that for random patterns. The model can be modified to store cycles of patterns and in particular is applied to the storage of continuous items of English text.

2031

and

By modifying the measure used to sum over coupling matrices, the authors generalise Gardner's (1988) calculation of the fractional interaction-space volume and storage capacity of neural network models. They also compute the local field distribution for the network. The generalised measure allows one to consider networks with a wide variety of properties away from saturation, but they find that the original results for saturated networks are universal for all well behaved measures. Other universality classes including those containing Hebb matrices and pseudo-inverse matrices are obtained by considering singular measures.

2039

, and

The authors study neural network models in which the synaptic efficacies are restricted to have a prescribed set of signs. It is proved that such neural networks can learn a set of random patterns by a perceptron-like algorithm which respects the synaptic restrictions at every step. In particular, it shows that learning can take place iteratively in a network which obeys Dale's rule, i.e. in which neurons are exclusively excitatory or inhibitory. The learning algorithm as well as its convergence theorem are stated in perceptron language and it is proved that the algorithm converges under the same conditions as required for an unconstrained perceptron. Numerical experiments show that these necessary conditions can actually be met for relatively large sets of patterns to be learned. They then argue that the results are invariant under the distribution of the signs, due to gauge invariance for random patterns. As a consequence the same sets of random patterns can be learned by networks which have any fixed distribution of synaptic signs, ranging from fully inhibitory to fully excitatory.

2047

An energy function is proposed whose long-time dynamic behaviour is believed to resemble that of a realistic large competitive neural network. A simple model which gives rise to this energy function is described. The model is solved exactly for a finite number of patterns in the thermodynamic limit using mean-field theory. Simulations of this model are presented. The behaviour of large competitive networks is discussed.

2057

The mean-field equations of a Q-state clock neural network model are derived in the replica-symmetric approximation using the replica method. These equations are studied for the cases Q=2, 3, 4 and Q to infinity . It is shown that an infinite number of patterns can be stored in the network even in the limit Q to infinity . The phase diagram and storage capacity of the network are calculated and the information content considered. Although the overlap achieved with a nominated pattern is shown to decrease as Q increases, the information stored in the network is shown to increase.

2069

The dynamics of asymmetrically diluted neural networks can be solved exactly. In the present work, the distribution of the neural activities is calculated analytically for zero-temperature parallel dynamics. This distribution depends on the number of stored patterns and is a continuous function in the good retrieval phase. The continuous part of the distribution of activities is due to the asymmetry of the synapses since it is known that networks with symmetric interactions always have a distribution of activities which is a sum of a few delta functions. The expression for the distribution of activities is also given for a mixture of two patterns which have a non-zero overlap.

2081

, and

Exact solutions for the dynamics of layered feedforward neural networks are presented. These networks are expected to respond to an input by going through a sequence of preassigned states on the various layers. The family of networks considered has a variety of interlayer couplings: linear and non-linear Hebbian, Hebbian with Gaussian synaptic noise and with various kinds of dilution. In addition, the authors also solve the problem of layered networks with the pseudoinverse (projector) matrix of couplings. In all cases the solutions take the form of layer-to-layer recursions for the mean overlap with a (random) key pattern and for the width of the embedding field distribution. The dynamics is governed by the fixed points of these recursions. For all cases, non-trivial domains of attraction of the memory states are found and graphically displayed.

2103

A neural network introduced by Tsodyks and Feigel'man (1988) suitable for the storage of biased patterns is studied in a randomly diluted form. Coupled evolution equations are derived for the two order parameters needed to describe a configuration near to a stored pattern. These equations are studied numerically and are found to exhibit non-linear effects such as spiralling trajectories and limit cycles. The bifurcations by which the transition to no memory occurs are illustrated. It is seen that a nominated pattern may not be within the basin of attraction of the memory fixed point correlated with it. The structure of a memory fixed point is investigated and found to be more complicated than a single configuration. Finally the situation where two patterns are highly correlated is examined and phase boundaries separating the regimes of no memory, undistinguishing memory and distinguishing memory are constructed. Multiple storage of a pattern does not improve its recall without appropriate modification of the thresholding parameter.

2119

, , and

The authors determine the coordination number in ordered and disordered packings of monosize spheres, following a first treatment given by Elizabeth Gardner (1985). The average number of contacts per unit volume is derived from the slope at the origin of the distribution law for the separators obtained by passing a random line across the assembly of spheres. Moreover, this law may be completely calculated and is universal up to a multiplicative constant. They have performed tests on numerical ordered and disordered assemblies of spheres and obtained the complete distribution function for the separators. They discuss the difference between the theoretical-biparticle and stereological approaches and discuss how the study of the pair correlation function analysis may extend the present treatment.

2133

, and

The authors investigate learning in the simplest type of a layered neural network, the one layer perceptron. The learning process is treated as a statistical dynamical problem. Quantities one is interested in include the relaxation time (the learning time) and the capacity and how they depend on noise and constraints on the weights. The relaxation time is calculated as a function of the noise level and the number p of associations to be learned. They consider three different cases for input patterns that are random and uncorrelated. In the first, where the connection weights are constrained to satisfy N-1 Sigma i omega i2=S2, there is a critical value of p(<N) separating regimes of perfect and imperfect learning at zero noise. In contrast, the second model, unconstrained learning, exhibits a different kind of transition at p=N, and noise plays no role. In the third model, where the constraint is imposed only on the thermal fluctuations, there is a line of phase transitions terminating at p=N and zero noise. They have also considered learning with correlated input patterns. The most important difference is the emergence of a second relaxation time, which the authors interpret as the time it takes to learn a prototype of the patterns.

2151

and

The two-point connected correlation function, or wavevector-dependent susceptibility, of the ferromagnetic Ising chain in a random field is calculated exactly at any temperature, for a two-parameter family of diluted symmetric exponential distributions of the magnetic fields. Thermodynamic properties of this model have been derived in a previous work by the authors. Besides the correlation function itself, the solution provides exact results for the (usual) susceptibility, the correlation length and the Edwards-Anderson parameter. The low-temperature regime is examined in full detail: the authors obtain in closed form the limit value at zero temperature of various quantities, and the first correction to this limit, which behaves linearly with temperature. The correlation length is discontinuous at zero temperature. They also derive the scaling form of the correlation function for small p, where the dilution p is the probability for a spin to be subjected to a non-zero random field. Even in this limit, the correlation function is more complicated than the simple Lorentzian predicted, e.g., by mean-field theory.

2181

Gardner's (1987,1988) computation of the number of N-bit patterns which can be stored in an optical neural network used as an associative memory is derived without replicas, using the cavity method. This allows for a unified presentation whatever the basic measure in the space of coupling constants, but above all it gives the clear physical content of the assumption of replica symmetry. TAP equations are also derived.

2191

and

The authors propose a new algorithm which builds a feedforward layered network in order to learn any Boolean function of N Boolean units. The number of layers and the number of hidden units in each layer are not prescribed in advance: they are outputs of the algorithm. It is an algorithm for growth of the network, which adds layers, and units inside a layer, at will until convergence. The convergence is guaranteed and numerical tests of this strategy look promising.

2205

and

The authors analyse the behaviour of an attractor neural network which exhibits low mean temporal activity levels, despite the fact that the intrinsic neuronal cycle time is very short (2-3 ms). Information and computation are represented on the excitatory neurons only. The influence of inhibitory neurons, which are assumed to react on a shorter timescale than the excitatory ones, is expressed as an effective interaction of the excitatory neurons. This leads to an effective model, which describes the interplay of excitation and inhibition acting on excitatory neurons in terms of the excitatory neural variables alone. The network operates in the presence of fast noise, which is large relative to the frozen randomness induced by the stored patterns. The overall fraction of active neurons is controlled by a single free parameter, which expresses the relative strength of the effective inhibition. Associative retrieval is identified, as usual, with the breakdown of ergodicity in the dynamics of the network, in particular with the presence of dynamical attractors corresponding to the retrieval of a given pattern. In such an attractor, the activity of neurons corresponding to active sites in the stored patterns increases at the expense of other neurons. Yet only a small fraction of the neurons active in the pattern are in the active state in each elementary time cycle, and they vary from cycle to cycle in an uncorrelated fashion, due to the noise. Hence, the observed mean activity rate of any individual neuron is kept low. This scenario is demonstrated by an analytical study based on the replica method, and the results are tested by numerical simulations.

2227

Gardner's (1988) analysis is used to study the evolution of a perceptron trained on a rule-controlled mapping with exceptions after it has gone through an irreversible random process of deterioration or lesion. It is shown that entropy considerations lead to useful statistical inferences based on partial information about the consequences of the lesions. In particular it is shown that patterns that follow the rule are more robust than patterns that have an exceptional response.

2233

and

The Aleksander model of neural networks replaces the connection weights of conventional models by logic devices (or Boolean functions). Learning is achieved by adjusting the Boolean functions stepwise via a 'training-with-noise' algorithm. The authors present a theory of the statistical dynamical properties of the randomly connected model and demonstrate that, in the limit of large but dilute connectivity c of the nodes, the storage capacity for associative memory is of the order (2/c2)2c, which corresponds, roughly speaking, to an average of one nearest-neighbouring pattern stored at site distances 2 on each node. Two parameters are introduced into the learning algorithm: qr and qc being respectively the probabilities to register a correct bit and erase an incorrect one. The effects of varying qr, qc and the training noise level on the storage capacity (after very long training) are discussed. In the limit of low training noise level, the training algorithm is equivalent to the so-called 'proximity rules'. Study of its retrieval properties shows that the model can be described as 'short ranged', whereas the Hopfield model is 'long ranged'. The advantages and disadvantages of introducing the intermediate u state into the system are also discussed.

2265

The dynamics of a neural network which uses three-state neurons (1, 0 and -1) is solved exactly in the limit of non-symmetric and highly dilute synapses. Recursion relations for the 'activity' (the fraction of non-zero neurons) and overlap of the network with a given pattern are derived which have three generic kinds of fixed points: a retrieval fixed point, a chaotic fixed point which corresponds to non-zero activity but no overlap, and a 'zero' fixed point where all the neurons go to the 0 state. As the non-retrieval fixed points both have activities different from the retrieval fixed point, one can easily tell whether a pattern has been recovered. An analysis of which fixed points occur as a function of the thresholds and the storage ratio of the system yields remarkably rich phase diagrams. Optimising the threshold level can be very important, especially when low-activity patterns are stored. A similar analysis can be applied to 'biased' networks using two-state (1,0) neurons. Finally, one finds that mixture states which have an overlap with two patterns can be stabilised by a threshold in the networks using three-state neurons. This property allows 'larger' (higher activity) memories to be naturally constructed out of smaller ones.