Neural Networks, A Comprehensive Foundation
Simon Haykin, 1999

Neural Networks, or Artificial Neural Networks to be more precise, represent a technology that is rooted in many disciplines: neurosciences, mathematics, statistics, physics, computer science, and engineering. Neural networks find application in such diverse fields as modeling, time series analysis, pattern recognition, signal processing, and control by virtue of a important property: the ability to learn from input data with or without a teacher.

This book provides a comprehensive foundation of neural networks, recognizing the interdisciplinary nature of the subject.

The book consists of four parts, organized as follows:

Chapter 1. Introduction

1.1. What is a Neural Network?
The brain routinely accomplishes perceptual recognition tasks (e.g., recognizing a familiar face embedded in an unfamiliar scene) in approximately 100-200 ms, whereas tasks of much lesser complexity may take days on a conventional computer. For another example, consider the sonar of a bat. In addition to providing information about how far away a target is, a bat sonar conveyes information about the relative velocity, the size, the size of various features, and the azimuth and elevation, of the target. The complex neural computations needed to extract all this information from the target echo occur within a brain the size of a plum.

We offer the following definition of a neural network viewed as an adaptive machine:

A neural network is a massively parallel distributed processor made up of simple processing units, which has a natural propensity for storing experiential knowledge and making it available for use. It resembles the brain in two respects:
1. Knowledge is acquired by the network from its environment through a learning process.
2. Interneuron connection strengths, known as synaptic weights, are used to store the acquired knowledge.

1.2. Human Brain

1.3. Models of a Neuron
We identify three basic elements of the neuronal model:
1. A set of synapses, or connecting links, each of which is characterized by a weight or strength of its own.
2. An adder for summing the input signals.
3. An activation function for limiting the amplitude of the output of the neuron.
The neuronal model also includes a bias, which has the effect of increasing or lowering the net input of the activation function.

1.4. Neural Networks Viewed as Directed Graphs

1.5. Feedback

1.6. Network Architectures

1.7. Knowledge Representation

1.8. Artificial Intelligence and Neural Networks

1.9. Historical Notes

Chapter 2. Learning Processes

2.1. Introduction
The property that is of primary significance for a neural network is the ability to learn from its environment, and to improve its performance through learning. A neural network learns about its environment over time through an interactive process of adjustments applied to its synaptic weights and bias levels.

2.2. Error-Correction Learning

2.3. Memory-Based Learning

2.4. Hebbian Learning

2.5. Competitive Learning

2.6. Boltzmann Learning

2.7. Credit Assignment Problem

2.8. Learning with a Teacher

2.9. Learning without a Teacher

2.10. Learning Tasks

2.11. Memory

2.12. Adaptation

2.13. Statistical Nature of the Learning Process

2.14. Statistical Learning Theory

2.15. Probably Approximately Correct Model of Learning

2.16. Summary and Discussion

Chapter 3. Single Layer Perceptrons

3.1. Introduction

3.2. Adaptive Filtering Problem

3.3. Unconstrained Optimization Techniques

3.4. Linear Least-Squares Filters

3.5. Least-Mean-Square Algorithm

3.6. Learning Curves

3.7. Learning Rate Annealing Techniques

3.8. Perceptron

3.9. Perceptron Convergence Theorem

3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment

3.11. Summary and Discussion

Chapter 4. Multilayer Perceptrons

4.1. Introduction

4.2. Some Preliminaries

4.3. Back-Propagation Algorithm

4.4. Summary of the Back-Propagation Algorithm

4.5. XOR Problem

4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better

4.7. Output Representation and Decision Rule

4.8. Computer Experiment

4.9. Feature Detection

4.10. Back-Propagation and Differentiation

4.11. Hessian Matrix

4.12. Generalization

4.13. Approximation of Functions

4.14. Cross-Validation

4.15. Network Pruning Techniques

4.16. Virtues and Limitations of Back-Propagation Learning

4.17. Accelerated Convergence of Back-Propagation Learning

4.18. Supervised Learning Viewed as an Optimization Problem

4.19. Convolutional Networks

4.20. Summary and Discussion

Chapter 5. Radial-Basis Function Networks

5.1. Introduction

5.2. Cover's Theorem on the Separability of Patterns

5.3. Interpolation Problem

5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem

5.5. Regularization Theory

5.6. Regularization Networks

5.7. Generalized Radial-Basis Function Networks

5.8. XOR Problem (Revisited)

5.9. Estimation of the Regularization Parameter

5.10. Approximation Properties of RBF Networks

5.11. Comparison of RBF Networks and Multilayer Perceptrons

5.12. Kernel Regression and Its Relation to RBF Networks

5.13. Learning Strategies

5.14. Computer Experiment

5.15. Summary and Discussion

Chapter 6. Support Vector Machines

6.1. Introduction

6.2. Optimal Hyperplane for Linearly Separable Patterns

6.3. Optimal Hyperplane for Nonseparable Patterns

6.4. How to Build a Support Vector Machine for Pattern Recognition

6.5. Example: XOR Problem (Revisited)

6.6. Computer Experiment

6.7. Epsilon-Insensitive Loss Function

6.8. Support Vector Machines for Nonlinear Regression

6.9. Summary and Discussion

Chapter 7. Committee Machines

7.1. Introduction

7.2. Ensemble Averaging

7.3. Computer Experiment I

7.4. Boosting

7.5. Computer Experiment II

7.6. Associative Gaussian Mixture Model

7.7. Hierarchical Mixture of Experts Model

7.8. Model Selection Using a Standard Decision Tree

7.9. A Priori and A Posteriori Probabilities

7.10. Maximum Likelihood Estimation

7.11. Learning Strategies for the HME Model

7.12. EM Algorithm

7.13. Application of the EM Algorithm to the HME Model

7.14. Summary and Discussion

Chapter 8. Principal Components Analysis

8.1. Introduction

8.2. Some Intuitive Principles of Self-Organization

8.3. Principal Components Analysis

8.4. Hebbian-Based Maximum Eigenfilter

8.5. Hebbian-Based Principal Components Analysis

8.6. Computer Experiment: Image Coding

8.7. Adaptive Principal Components Analysis Using Lateral Inhibition

8.8. Two Classes of PCA Algorithms

8.9. Batch and Adaptive Methods of Computation

8.10. Kernel-Based Principal Components Analysis

8.11. Summary and Discussion

Chapter 9. Self-Organizing Maps

9.1. Introduction

9.2. Two Basic Feature-Mapping Models

9.3. Self-Organizing Map

9.4. Summary of the SOM Algorithm

9.5. Properties of the Feature Map

9.6. Computer Simulations

9.7. Learning Vector Quantization

9.8. Computer Experiment: Adaptive Pattern Classification

9.9. Hierarchical Vector Quantization

9.10. Contextual Maps

9.11. Summary and Discussion

Chapter 10. Information-Theoretic Models

10.1. Introduction

10.2. Entropy

10.3. Maximum Entropy Principle

10.4. Mutual Information

10.5. Kullback-Leibler Divergence

10.6. Mutual Information as an Objective Function to be Optimized

10.7. Maximum Mutual Information Principle

10.8. Infomax and Redundancy Reduction

10.9. Spatially Coherent Features

10.10. Spatially Incoherent Features

10.11. Independent Components Analysis

10.12. Computer Experiment

10.13. Maximum Likelihood Estimation

10.14. Maximum Entropy Method

10.15. Summary and Discussion

Chapter 11. Stochastic Machines and Their Approximates Rooted in Statistical Mechanics

11.1. Introduction

11.2. Statistical Mechanics

11.3. Markov Chains

11.4. Metropolis Algorithm

11.5. Simulated Annealing

11.6. Gibbs Sampling

11.7. Boltzmann Machine

11.8. Sigmoid Belief Networks

11.9. Helmholtz Machine

11.10. Mean-Field Theory

11.11. Deterministic Boltzmann Machine

11.12. Deterministic Sigmoid Belief Networks

11.13. Deterministic Annealing

11.14. Summary and Discussion

Chapter 12. Neurodynamic Programming

12.1. Introduction

12.2. Markovian Decision Processes

12.3. Bellman's Optimality Criterion

12.4. Policy Iteration

12.5. Value Iteration

12.6. Neurodynamic Programming

12.7. Approximate Policy Iteration

12.8. Q-Learning

12.9. Computer Experiment

12.10. Summary and Discussion

Chapter 13. Temporal Processing Using Feedforward Networks

13.1. Introduction

13.2. Short-Term Memory Structures

13.3. Network Architectures for Temporal Processing

13.4. Focussed Time Lagged Feedforward Networks

13.5. Computer Experiment

13.6. Universal Myopic Mapping Theorem

13.7. Spatio-Temporal Models of a Neuron

13.8. Distributed Time Lagged Feedforward Networks

13.9. Temporal Back-Propagation Algorithm

13.10. Summary and Discussion

Chapter 14. Neurodynamics

14.1. Introduction

14.2. Dynamical Systems

14.3. Stability of Equilibrium States

14.4. Attractors

14.5. Neurodynamical Models

14.6. Manipulation of Attractors as a Recurrent Network Paradigm

14.7. Hopfield Models

14.8. Computer Experiment I

14.9. Cohen-Grossberg Theorem

14.10. Brain-State-in-a-Box Model

14.11. Computer Experiment II

14.12. Strange Attractors and Chaos

14.13. Dynamic Reconstruction of a Chaotic Process

14.14. Computer Experiment III

14.15. Summary and Discussion

Chapter 15. Dynamically Driven Recurrent Networks

15.1. Introduction

15.2. Recurrent Network Architectures

15.3. State-Space Model

15.4. Nonlinear Autoregressive with Exogenous Inputs Model

15.5. Computational Power of Recurrent Networks

15.6. Learning Algorithms

15.7. Back-Propagation Through Time

15.8. Real-Time Recurrent Learning

15.9. Kalman Filters

15.10. Decoupled Extended Kalman Filters

15.11. Computer Experiment

15.12. Vanishing Gradients in Recurrent Networks

15.13. System Identification

15.14. Model-Reference Adaptive Control

15.15. Summary and Discussion


Top of Page | NN Opinion | Sort by Topic | Sort by Title | Sort by Author