A brief history of Neural Networks¶

Faisal Qureshi
http://www.vclab.ca

Claude Shannon, Father of Information Theory.
I visualise a time when we will be to robots what dogs are to humans, and I’m rooting for the machines.

Jeff Hawkins, Founder of Palm Computing.

The key to artificial intelligence has always been the representation.

Lesson Plan¶

Computational models of Neurons
Pre-deep learning
Imagenet 2012
Takeaways
- What
- How
- Why now?
- Impact
Ethical and social implications

McCulloch and Pitts (1943)¶

Proposed a model of nervous systems as a network of threshold units.
Connections between simple units performing elementry operations give rise to intelligence.

Threshold units¶

Neuron (picture from Wikipedia)

Artificial neuron

Learning via reinforcing connections between Neurons (1949 to 1982)¶

Hebbian Learning¶

Hebbian Learning (Donald Hebb, 1949) principle proposes to learn patterns by reinforcing connections between Neurons that tend to fire together.
- Biologically plausible, but it is not used in practice
First artificial neural network consisting of 40 neurons (Marvin Minsky, 1951)
- Uses Hebbian Learning

Perceptron¶

Frank Rosenblatt (1958) perceptron to classify 20x20 images
- Percpetron is neural network comprising a single neuron

Cat visual cortex¶

David Hubel and Torsten Wiesel studied cat visual cortex and showed that visual information goes through a series of processing steps: 1) edge detection; 2) edge combination; 3) motion perception; etc. (Hubeland Wiesel, 1959)

Backpropatation¶

Backpropagation for artificial neural networks (Paul Werbos, 1982)
- An application of chain-rule from differential calculus

Towards (deep) neural networks¶

Neocognitron¶

Fukushima (1980) implemented Neocognitnron that was capable of handwritten character recognition.
- This model was based upon the findings of Hubel and Wiesel.
- This model can be seen as a precursor of modern convolutional networks.

Hidden units and backpropagation¶

Rumelhart et al. (1988) used backpropagation to train a network similar to Neocognitron.
- Units in hidden layers learn meaningful representations

LeNet¶

In 1989, LeCun et al. proposed LeNet, a convolution neural network very similar to networks that we see today
- Capable for recognizing hand-written digits
- Trained using backpropagation

Deep learning (the beginning)¶

ImageNet Large Scale Visual Recognition Challenge¶

Large amount of training data is critical to the success of deep learning methods
ImageNet challenge was devised to capture the performance of various image recognition methods
- 1 million images belonging to 1000 different classes
- It's size was key to the development early deep learning models

Datasets¶

Datasets used for deep learning model develop are divided into three sets:
- Training set is used train the deep learning model;
- Validation set is used to tune the hyperparameters, implement early stopping, etc.; and
- Test set is used to evaluate model performance.

AlexNet (2012)¶

Krizhevsky et al. trained a convolution network, similar to LeNet5, but containing far more layers, neurons, and connections, on the ImageNet Challenge using Graphical Processing Units (GPUs). This model was able to beat the state-of-the-art image classification methods by a large margin.
GPUs are criticial to the success of deep learning methods.

Models may outperform humans!?

Deep learning takes over (2012 onwards)¶

Large datasets and vast GPU compute infrastructures led to larger and more complex deep learning models for solving problems in a variety of domains ranging
- from computer vision to speach recognition,
- from medical imaging to text understanding,
- from computer graphics to industrial design,
- from autonomous driving to drug discovery, etc.

Takeaways¶

What¶

Deep learning is a natural extension of artificial neural networks of the 90s.
- Extracts useful patterns from data
- Learns powerful representations
- Reduces the "semantic gap"

How¶

Chain rule (or backpropagation)
- Computes how error (or more generally, the quantity to optimize) changes when model parameters change
Stochastic gradient descent
- Iteratively update network parameters to "minimize the error" (How)
Convolutions
- Bakes in the intuition that signal is structured and often has some stationary properties
- Allows processing of large signals
Hidden layers

Why now?¶

GPUs that support vectorized processing (tensor operations)
Large datasets

Engineering advances¶

Computationally speaking, a deep learning model can be formalized as a graph of tensor operations:
- Nodes perform tensor operations; and
- Results propagate along edges between nodes.
Provides new ways of thinking about deep learning models.
- Recursive nature: each node is capable of sophisticated, non-trivial computation, perhaps leveraging another neural network
Autodiff
- Techniques to evaluate the "derivative of a computer program"
Deep learning frameworks
- PyTorch
- TensorFlow
- etc.

Impact¶

Image classification
Face recognition
Speech recognition
Text-to-speech generation
Handwriting transcription
Medical image analysis and diagnosis
Ads
Cars: lane-keeping, automatic cruise control

Myth
- Killer robots will enslave us
Reality
- Deep learning (and more generally, artificial intelligence) will have a profound effect on our society
  - Legal, social, philosophical, political, and personal

A brief history of Neural Networks¶

Lesson Plan¶

McCulloch and Pitts (1943)¶

Threshold units¶

Learning via reinforcing connections between Neurons (1949 to 1982)¶

Hebbian Learning¶

Perceptron¶

Cat visual cortex¶

Backpropatation¶

Towards (deep) neural networks¶

Neocognitron¶

Hidden units and backpropagation¶

LeNet¶

Deep learning (the beginning)¶

ImageNet Large Scale Visual Recognition Challenge¶

Datasets¶

AlexNet (2012)¶

Deep learning takes over (2012 onwards)¶

Takeaways¶

What¶

How¶

Why now?¶

Engineering advances¶

Impact¶

Social and ethical implications¶