M.E. Irizarry-Gelpí

Physics impostor. Mathematics interloper. Husband. Father.

Deep Learning Fundamentals with Keras 4


A shallow neural network consist of one hidden layer. Not sure if there is a restriction on the amount of neurons in that hidden layer. A deep neural network consist of many hidden layers and has a large number of neurons on each layer. These can automatically extract features to learn data better.

Here are three factors that can be attributed to the recent surge in success of deep learning methods:

  1. The topic has made progress. For example, overcoming the vanishing gradient problem with ReLU activation functions.
  2. Data has become more available. Large amounts of data needs to be used in order to avoid over-fitting during training.
  3. Computational power has improved too.

There are supervised and unsupervised deep learning algorithms.

Convolutional Neural Networks

A convolutional neural network (CNN) is an example of a supervised deep learning algorithm. These typically take an image in the input layer. Convolutions make the training more efficient. CNNs can solve problems involving image recognition, object detection, and other computer vision applications.

The architecture of a CNN is as follows. You have the input layer, which takes an image. There are convoluting layers, pooling layers, fully-connected layers, and the output layer. The fully-connected layers are necessary to generate the output.

An image can be a grey-scale image or a color image. Grey-scale images consist of a single two-dimensional array of pixels. Color images consist of three two-dimensional arrays of pixels (one array each for red, green, and blue). The input layer takes the image array (grey-scale) or image arrays (color).

The convoluting layers have filters that are used to convolute with the input. For example, let \(R\) be the \(m \times n\) matrix corresponding to a red image, and \(F\) be the \(p \times q\) matrix corresponding to a filter. The convolution of \(R\) with \(F\) will be a matrix \(C\) such that

\begin{equation*} C_{jk} = R(j, k) \cdot F \end{equation*}

Here \(R(j, k)\) is the \(p \times q\) block in \(R\) with \(R_{jk}\) in the top left corner. Note that the matrix dot product (also known as the Frobenius product) is taken here. Here is some Python code with an example:

import numpy as np

R = np.matrix([
    [25, 110, 65, 0, 34],
    [15, 14, 10, 54, 66],
    [65, 76, 54, 200, 210],
    [176, 5, 73, 67, 89],
    [63, 90, 95, 111, 175],
])

F = np.matrix([
    [0, 1],
    [0, 1],
])

C = np.matrix([
    [np.tensordot(R[j:j+2, k:k+2], F) for k in np.arange(4)] for j in np.arange(4)
])

print(C)

The result for the convolution is

[[124  75  54 100]
 [ 90  64 254 276]
 [ 81 127 267 299]
 [ 95 168 178 264]]

Convolution is a great way to decrease the number of parameters, instead of just flattening the image as a one-dimensional array. After the convolution step, a ReLU layer may be applied.

The pooling layers reduced the spatial dimension of the data. One kind of pooling layer is MaxPooling, where one keeps the largest value of the region that is scanned. For example, MaxPooling on the convolution matrix from above with a (2, 2) stride gives

MaxPooling = np.matrix([
    [np.max(C[j:j+2, k:k+2]) for k in np.arange(4, step=2)] for j in np.arange(4, step=2)
])

print(MaxPooling)

The result is

[[124 276]
 [168 299]]

MaxPooling is one of the most common pooling layers. Another kind is AveragePooling or MeanPooling, where one keeps the mean value of the region that is scanned. For example, MeanPooling on the convolution matrix from above with a (2, 2) stride gives

MeanPooling = np.matrix([
    [np.mean(C[j:j+2, k:k+2]) for k in np.arange(4, step=2)] for j in np.arange(4, step=2)
], dtype=int)

print(MeanPooling)

The result is

[[ 88 171]
 [117 252]]

Pooling also provides spatial variance to the CNN that enables it to recognize objects that do not exactly resemble the original objects trained on.

In the fully-connected layer, you flatten the output of the previous layer and use as many neurons as there are classes of objects for classification.

CNNs in Keras

CNNs can be implemented in Keras as follows. You start with a sequential model:

import keras

from keras.models import Sequential

model = Sequential()

Of course, you also need layers:

from keras.models import Sequential
from keras.layers import Dense
from keras.utils import to_categorical
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.layers import Flatten

And data:

from keras.datasets import mnist

You load the data, which is already split into training and testing sets:

(X_train, y_train), (X_test, y_test) = mnist.load_data()

You need to reshape the data for proper processing:

import numpy as np

X_train = np.reshape(
    X_train,
    (X_train.shape[0], 28, 28, 1),
).astype('float32')

X_test = np.reshape(
    X_test,
    (X_test.shape[0], 28, 28, 1),
).astype('float32')

And normalize the data:

X_train = X_train / 255
X_test = X_test / 255

The target data must be but into binary categories:

y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

Now we are ready to roll. Here is a convoluting layer, followed by a pooling layer:

# Convolution layer
model.add(Conv2D(
    16,
    (5, 5),
    strides=(1, 1),
    activation='relu',
    input_shape=(28, 28, 1),
))

# Pooling layer
model.add(MaxPooling2D(
    pool_size=(2, 2),
    strides=(2, 2),
))

Another possibility is to use multiple convoluting and pooling layers:

# First Convolution layer
model.add(Conv2D(
    16,
    (5, 5),
    strides=(1, 1),
    activation='relu',
    input_shape=(28, 28, 1),
))

# First Pooling layer
model.add(MaxPooling2D(
    pool_size=(2, 2),
    strides=(2, 2),
))

# Second Convolution layer
model.add(Conv2D(
    8,
    (5, 5),
    strides=(1, 1),
    activation='relu',
    input_shape=(28, 28, 1),
))

# Second Pooling layer
model.add(MaxPooling2D(
    pool_size=(2, 2),
    strides=(2, 2),
))

After this you can add a flattening layer:

model.add(Flatten())

Now you are ready for the fully-connected layer:

model.add(Dense(
    100,
    activation='relu',
))

Finally, you have the output layer:

model.add(Dense(
    num_classes,
    activation='softmax',
))

You compile the model:

model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy'],
)

You fit the model (training):

model.fit(
    X_train, y_train,
    validation_data=(X_test, y_test),
    epochs=10,
    batch_size=200,
    verbose=2,
)

Finally, you evaluate the model:

scores = model.evaluate(X_test, y_test, verbose=0)

print(scores)

You can compare this to the network previously used with the MNIST data, which only had dense layers.

Recurrent Neural Networks

Another kind of supervised deep learning model is the recurrent neural network (RNN). In the previous examples, each data point was assumed to be an independent instance. RNNs are neural networks with loops that take new inputs, as well as the value in the previous instance.

For example, consider the following RNN. For the first layer, the input is \(x_{0}\), you have weight \(w_{0}\), bias \(b_{0}\). You compute

\begin{equation*} x_{0} \longrightarrow z_{0} = w_{0} x_{0} + b_{0} \longrightarrow a_{0} = f(z_{0}) \end{equation*}

The output \(a_{0}\) appears weighted by a recursion weight \(w_{01}\) in the computation for the next input:

\begin{equation*} x_{1} \longrightarrow z_{1} = w_{1} x_{1} + b_{1} + w_{01} a_{0} \longrightarrow a_{1} = f(z_{1}) \end{equation*}

And similarly for the next case:

\begin{equation*} x_{2} \longrightarrow z_{2} = w_{2} x_{2} + b_{2} + w_{12} a_{1} \longrightarrow a_{2} = f(z_{2}) \end{equation*}

And so on. These kinds of algorithm have a sort-of temporal dimension. A popular type of RNN is the Long Short-Term Memory (LSTM) model, which has been use for many applications.

Autoencoders

Autoencoders are an example of an unsupervised deep learning model. Autoencoding refers to a data compression algorithm where the compression and decompression functions are learned automatically from the data, and not engineered externally. An autoencoder will work on data similar to the data for which it was trained on. Some example applications of autoencoders are data denoising and dimensionality reduction.

The basic architecture consist of an encoder, that finds the optimal compressed representation of an input, and a decoder, that restores the image. In a sense, it is a non-trivial version of the identity operation. Due to the use of non-linear activation functions, autoencoders can learn data projection techniques that are more sophisticated than basic techniques like principal component analysis.

The Restricted Boltzmann Machine (RBM) is a very popular type of autoencoders. These can be used for fixing imbalanced data sets, estimating missing values or automatic feature extraction of unstructured data.