**Preface**

Neural Networks, or Artificial Neural Networks to be more precise, represent
a technology that is rooted in many disciplines: neurosciences, mathematics,
statistics, physics, computer science, and engineering. Neural networks
find application in such diverse fields as modeling, time series analysis,
pattern recognition, signal processing, and control by virtue of a important
property: the ability to *learn* from input data with or without a
teacher.

This book provides a comprehensive foundation of neural networks, recognizing
the interdisciplinary nature of the subject.

The book consists of four parts, organized as follows:

**Chapter 1. Introduction**

**1.1. What is a Neural Network?**

The brain routinely accomplishes perceptual recognition tasks (e.g., recognizing
a familiar face embedded in an unfamiliar scene) in approximately 100-200 ms,
whereas tasks of much lesser complexity may take days on a conventional computer.
For another example, consider the sonar of a bat. In addition to providing
information about how far away a target is, a bat sonar conveyes information
about the relative velocity, the size, the size of various features, and the
azimuth and elevation, of the target. The complex neural computations needed
to extract all this information from the target echo occur within a brain the
size of a plum.

We offer the following definition of a neural network viewed as an adaptive
machine:

*A neural network is a massively parallel distributed processor
made up of simple processing units, which has a natural propensity for
storing experiential knowledge and making it available for use. It resembles
the brain in two respects:
*

1. Knowledge is acquired by the network from its environment through a
learning process.

2. Interneuron connection strengths, known as synaptic weights, are used to
store the acquired knowledge.

**1.2. Human Brain**

**1.3. Models of a Neuron**

We identify three basic elements of the neuronal model:

1. A set of *synapses*, or connecting links, each of which is characterized
by a *weight* or strength of its own.

2. An *adder* for summing the input signals.

3. An *activation function* for limiting the amplitude of the output of
the neuron.

The neuronal model also includes a *bias*, which has the effect of
increasing or lowering the net input of the activation function.

**1.4. Neural Networks Viewed as Directed Graphs**

**1.5. Feedback**

**1.6. Network Architectures**

**1.7. Knowledge Representation**

**1.8. Artificial Intelligence and Neural Networks**

**1.9. Historical Notes**

**Chapter 2. Learning Processes**

**2.1. Introduction**

The property that is of primary significance for a neural network is the ability
to *learn* from its environment, and to *improve* its performance
through learning. A neural network learns about its environment over time
through an interactive process of adjustments applied to its synaptic weights
and bias levels.

**2.2. Error-Correction Learning**

**2.3. Memory-Based Learning**

**2.4. Hebbian Learning**

**2.5. Competitive Learning**

**2.6. Boltzmann Learning**

**2.7. Credit Assignment Problem**

**2.8. Learning with a Teacher**

**2.9. Learning without a Teacher**

**2.10. Learning Tasks**

**2.11. Memory**

**2.12. Adaptation**

**2.13. Statistical Nature of the Learning Process**

**2.14. Statistical Learning Theory**

**2.15. Probably Approximately Correct Model of Learning**

**2.16. Summary and Discussion**

**Chapter 3. Single Layer Perceptrons**

**3.1. Introduction**

**3.2. Adaptive Filtering Problem**

**3.3. Unconstrained Optimization Techniques**

**3.4. Linear Least-Squares Filters**

**3.5. Least-Mean-Square Algorithm**

**3.6. Learning Curves**

**3.7. Learning Rate Annealing Techniques**

**3.8. Perceptron**

**3.9. Perceptron Convergence Theorem**

**3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment**

**3.11. Summary and Discussion**

**Chapter 4. Multilayer Perceptrons**

**4.1. Introduction**

**4.2. Some Preliminaries**

**4.3. Back-Propagation Algorithm**

**4.4. Summary of the Back-Propagation Algorithm**

**4.5. XOR Problem**

**4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better**

**4.7. Output Representation and Decision Rule**

**4.8. Computer Experiment**

**4.9. Feature Detection**

**4.10. Back-Propagation and Differentiation**

**4.11. Hessian Matrix**

**4.12. Generalization**

**4.13. Approximation of Functions**

**4.14. Cross-Validation**

**4.15. Network Pruning Techniques**

**4.16. Virtues and Limitations of Back-Propagation Learning**

**4.17. Accelerated Convergence of Back-Propagation Learning**

**4.18. Supervised Learning Viewed as an Optimization Problem**

**4.19. Convolutional Networks**

**4.20. Summary and Discussion**

**Chapter 5. Radial-Basis Function Networks**

**5.1. Introduction**

**5.2. Cover's Theorem on the Separability of Patterns**

**5.3. Interpolation Problem**

**5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem**

**5.5. Regularization Theory**

**5.6. Regularization Networks**

**5.7. Generalized Radial-Basis Function Networks**

**5.8. XOR Problem (Revisited)**

**5.9. Estimation of the Regularization Parameter**

**5.10. Approximation Properties of RBF Networks**

**5.11. Comparison of RBF Networks and Multilayer Perceptrons**

**5.12. Kernel Regression and Its Relation to RBF Networks**

**5.13. Learning Strategies**

**5.14. Computer Experiment**

**5.15. Summary and Discussion**

**Chapter 6. Support Vector Machines**

**6.1. Introduction**

**6.2. Optimal Hyperplane for Linearly Separable Patterns**

**6.3. Optimal Hyperplane for Nonseparable Patterns**

**6.4. How to Build a Support Vector Machine for Pattern Recognition**

**6.5. Example: XOR Problem (Revisited)**

**6.6. Computer Experiment**

**6.7. Epsilon-Insensitive Loss Function**

**6.8. Support Vector Machines for Nonlinear Regression**

**6.9. Summary and Discussion**

**Chapter 7. Committee Machines**

**7.1. Introduction**

**7.2. Ensemble Averaging**

**7.3. Computer Experiment I**

**7.4. Boosting**

**7.5. Computer Experiment II**

**7.6. Associative Gaussian Mixture Model**

**7.7. Hierarchical Mixture of Experts Model**

**7.8. Model Selection Using a Standard Decision Tree**

**7.9. A Priori and A Posteriori Probabilities**

**7.10. Maximum Likelihood Estimation**

**7.11. Learning Strategies for the HME Model**

**7.12. EM Algorithm**

**7.13. Application of the EM Algorithm to the HME Model**

**7.14. Summary and Discussion**

**Chapter 8. Principal Components Analysis**

**8.1. Introduction**

**8.2. Some Intuitive Principles of Self-Organization**

**8.3. Principal Components Analysis**

**8.4. Hebbian-Based Maximum Eigenfilter**

**8.5. Hebbian-Based Principal Components Analysis**

**8.6. Computer Experiment: Image Coding**

**8.7. Adaptive Principal Components Analysis Using Lateral Inhibition**

**8.8. Two Classes of PCA Algorithms**

**8.9. Batch and Adaptive Methods of Computation**

**8.10. Kernel-Based Principal Components Analysis**

**8.11. Summary and Discussion**

**Chapter 9. Self-Organizing Maps**

**9.1. Introduction**

**9.2. Two Basic Feature-Mapping Models**

**9.3. Self-Organizing Map**

**9.4. Summary of the SOM Algorithm**

**9.5. Properties of the Feature Map**

**9.6. Computer Simulations**

**9.7. Learning Vector Quantization**

**9.8. Computer Experiment: Adaptive Pattern Classification**

**9.9. Hierarchical Vector Quantization**

**9.10. Contextual Maps**

**9.11. Summary and Discussion**

**Chapter 10. Information-Theoretic Models**

**10.1. Introduction**

**10.2. Entropy**

**10.3. Maximum Entropy Principle**

**10.4. Mutual Information**

**10.5. Kullback-Leibler Divergence**

**10.6. Mutual Information as an Objective Function to be Optimized**

**10.7. Maximum Mutual Information Principle**

**10.8. Infomax and Redundancy Reduction**

**10.9. Spatially Coherent Features**

**10.10. Spatially Incoherent Features**

**10.11. Independent Components Analysis**

**10.12. Computer Experiment**

**10.13. Maximum Likelihood Estimation**

**10.14. Maximum Entropy Method**

**10.15. Summary and Discussion**

**Chapter 11. Stochastic Machines and Their Approximates Rooted in Statistical Mechanics**

**11.1. Introduction**

**11.2. Statistical Mechanics**

**11.3. Markov Chains**

**11.4. Metropolis Algorithm**

**11.5. Simulated Annealing**

**11.6. Gibbs Sampling**

**11.7. Boltzmann Machine**

**11.8. Sigmoid Belief Networks**

**11.9. Helmholtz Machine**

**11.10. Mean-Field Theory**

**11.11. Deterministic Boltzmann Machine**

**11.12. Deterministic Sigmoid Belief Networks**

**11.13. Deterministic Annealing**

**11.14. Summary and Discussion**

**Chapter 12. Neurodynamic Programming**

**12.1. Introduction**

**12.2. Markovian Decision Processes**

**12.3. Bellman's Optimality Criterion**

**12.4. Policy Iteration**

**12.5. Value Iteration**

**12.6. Neurodynamic Programming**

**12.7. Approximate Policy Iteration**

**12.8. Q-Learning**

**12.9. Computer Experiment**

**12.10. Summary and Discussion**

**Chapter 13. Temporal Processing Using Feedforward Networks**

**13.1. Introduction**

**13.2. Short-Term Memory Structures**

**13.3. Network Architectures for Temporal Processing**

**13.4. Focussed Time Lagged Feedforward Networks**

**13.5. Computer Experiment**

**13.6. Universal Myopic Mapping Theorem**

**13.7. Spatio-Temporal Models of a Neuron**

**13.8. Distributed Time Lagged Feedforward Networks**

**13.9. Temporal Back-Propagation Algorithm**

**13.10. Summary and Discussion**

**Chapter 14. Neurodynamics**

**14.1. Introduction**

**14.2. Dynamical Systems**

**14.3. Stability of Equilibrium States**

**14.4. Attractors**

**14.5. Neurodynamical Models**

**14.6. Manipulation of Attractors as a Recurrent Network Paradigm**

**14.7. Hopfield Models**

**14.8. Computer Experiment I**

**14.9. Cohen-Grossberg Theorem**

**14.10. Brain-State-in-a-Box Model**

**14.11. Computer Experiment II**

**14.12. Strange Attractors and Chaos**

**14.13. Dynamic Reconstruction of a Chaotic Process**

**14.14. Computer Experiment III**

**14.15. Summary and Discussion**

**Chapter 15. Dynamically Driven Recurrent Networks**

**15.1. Introduction**

**15.2. Recurrent Network Architectures**

**15.3. State-Space Model**

**15.4. Nonlinear Autoregressive with Exogenous Inputs Model**

**15.5. Computational Power of Recurrent Networks**

**15.6. Learning Algorithms**

**15.7. Back-Propagation Through Time**

**15.8. Real-Time Recurrent Learning**

**15.9. Kalman Filters**

**15.10. Decoupled Extended Kalman Filters**

**15.11. Computer Experiment**

**15.12. Vanishing Gradients in Recurrent Networks**

**15.13. System Identification**

**15.14. Model-Reference Adaptive Control**

**15.15. Summary and Discussion**