A brief history of Neural Networks¶

Faisal Qureshi
http://www.vclab.ca

• Claude Shannon, Father of Information Theory.

I visualise a time when we will be to robots what dogs are to humans, and I’m rooting for the machines.

• Jeff Hawkins, Founder of Palm Computing.

The key to artificial intelligence has always been the representation.

Lesson Plan¶

• Computational models of Neurons
• Pre-deep learning
• Imagenet 2012
• Takeaways
• What
• How
• Why now?
• Impact
• Ethical and social implications

McCulloch and Pitts (1943)¶

• Proposed a model of nervous systems as a network of threshold units.
• Connections between simple units performing elementry operations give rise to intelligence.

Threshold units¶

• Neuron (picture from Wikipedia)
• Artificial neuron

Learning via reinforcing connections between Neurons (1949 to 1982)¶

Hebbian Learning¶

• Hebbian Learning (Donald Hebb, 1949) principle proposes to learn patterns by reinforcing connections between Neurons that tend to fire together.
• Biologically plausible, but it is not used in practice
• First artificial neural network consisting of 40 neurons (Marvin Minsky, 1951)
• Uses Hebbian Learning

Perceptron¶

• Frank Rosenblatt (1958) perceptron to classify 20x20 images
• Percpetron is neural network comprising a single neuron

Cat visual cortex¶

• David Hubel and Torsten Wiesel studied cat visual cortex and showed that visual information goes through a series of processing steps: 1) edge detection; 2) edge combination; 3) motion perception; etc. (Hubeland Wiesel, 1959)

Backpropatation¶

• Backpropagation for artificial neural networks (Paul Werbos, 1982)
• An application of chain-rule from differential calculus

Towards (deep) neural networks¶

Neocognitron¶

• Fukushima (1980) implemented Neocognitnron that was capable of handwritten character recognition.
• This model was based upon the findings of Hubel and Wiesel.
• This model can be seen as a precursor of modern convolutional networks.

Hidden units and backpropagation¶

• Rumelhart et al. (1988) used backpropagation to train a network similar to Neocognitron.
• Units in hidden layers learn meaningful representations

LeNet¶

• In 1989, LeCun et al. proposed LeNet, a convolution neural network very similar to networks that we see today
• Capable for recognizing hand-written digits
• Trained using backpropagation

Deep learning (the beginning)¶

ImageNet Large Scale Visual Recognition Challenge¶

• Large amount of training data is critical to the success of deep learning methods
• ImageNet challenge was devised to capture the performance of various image recognition methods
• 1 million images belonging to 1000 different classes
• It's size was key to the development early deep learning models

Datasets¶

• Datasets used for deep learning model develop are divided into three sets:
• Training set is used train the deep learning model;
• Validation set is used to tune the hyperparameters, implement early stopping, etc.; and
• Test set is used to evaluate model performance.

AlexNet (2012)¶

• Krizhevsky et al. trained a convolution network, similar to LeNet5, but containing far more layers, neurons, and connections, on the ImageNet Challenge using Graphical Processing Units (GPUs). This model was able to beat the state-of-the-art image classification methods by a large margin.
• GPUs are criticial to the success of deep learning methods.
• Models may outperform humans!?

Deep learning takes over (2012 onwards)¶

• Large datasets and vast GPU compute infrastructures led to larger and more complex deep learning models for solving problems in a variety of domains ranging
• from computer vision to speach recognition,
• from medical imaging to text understanding,
• from computer graphics to industrial design,
• from autonomous driving to drug discovery, etc.

Takeaways¶

What¶

• Deep learning is a natural extension of artificial neural networks of the 90s.
• Extracts useful patterns from data
• Learns powerful representations
• Reduces the "semantic gap"

How¶

• Chain rule (or backpropagation)
• Computes how error (or more generally, the quantity to optimize) changes when model parameters change
• Iteratively update network parameters to "minimize the error" (How)
• Convolutions
• Bakes in the intuition that signal is structured and often has some stationary properties
• Allows processing of large signals
• Hidden layers

Why now?¶

• GPUs that support vectorized processing (tensor operations)
• Large datasets

• Computationally speaking, a deep learning model can be formalized as a graph of tensor operations:
• Nodes perform tensor operations; and
• Results propagate along edges between nodes.
• Provides new ways of thinking about deep learning models.
• Recursive nature: each node is capable of sophisticated, non-trivial computation, perhaps leveraging another neural network
• Autodiff
• Techniques to evaluate the "derivative of a computer program"
• Deep learning frameworks
• PyTorch
• TensorFlow
• etc.

Impact¶

• Image classification
• Face recognition
• Speech recognition
• Text-to-speech generation
• Handwriting transcription
• Medical image analysis and diagnosis