An Introduction to Deep Neural Networks

The following blog article is based on my learnings from the FastAI lectures version 3 from 2019. I carefully summarized the lessons of Jeremy Howard.

Introduction to a Neural Network

The terms ‘Neural Networks’ in the context of ‘Deep Learning’ a subset of Machine Learning is a concept that allows to classify images with high confidence, the detection and recognition of objects, the creation of images or text from little input, amongst many other use cases. So called ‘deep neural networks’ allow humanity to outsource many manual tasks by training such networks on tasks, and then, once trained, using them for inference.

Why do Neural Networks work?

The theorem called the ‘Universal Approximation’ makes deep neural networks so good at solving human tasks like classification. The theorem states: ‘In the mathematical theory of artificial neural networks, universal approximation theorem type results state that a feed-forward network constructed of artificial neurons can approximate arbitrary well real-valued continuous functions on compact subsets of R.’ This means that if you have stacks of linear functions and non-linearities (i.e. Deep Neural Networks), then your model can approximate any function arbitrarily closely. Given the millions of parameter, in like 1000+ layers that a deep neural network is processing, there is enough flexibility to represent nature’s complexities. But what is actually approximating and processed in a deep neural network? Basically, it is just the multiplication of matrices. The foundations are quite simple.

Mathematical Foundations behind Neural Networks

The following two images from the excellent Youtube series by ThreeBlueOneBrown allow to see how Computer Vision is made mathematical for Deep Learning purposes. Here, we can see how a simple image of a number is dismantled into different pixels.


And this one complements the previous one by indicating how the representation flows into a Deep Neural net.


Generally, Neural Networks (deep or shallow ones) are simple dot product multiplications consisting of weights (i.e. parameters) and activations. While weights store data and are used to do the calculations, activations are the results of the calculus. In each layer of a Neural Network, matrices are multiplied.

Matrix Multiplication Example

For example a (3,1) matrix multiplied with a (3,5) matrix will result in a (5,1) matrix. The resulting matrix will then be run through an activation function. The activated (5,1) matrix, will then, in turn, be used to compute matrix values. Due to the laws of matrix multiplication, the next matrix will be an arbitrary (5, 8) matrix, and so on.

Input Types and Preprocessing

Input Types Preprocessing Input
Text, Tabular, Images, Audio, Video Tokenization, Numericalization, Principal Component Analysis, Matrix Factorization, Warping, Zooming, … From preprocessed Input take a random mini-batch for Stochastic Gradient Descent.

Related Terms:

Neural Net Architectures

The following image depicts a Neural Network Architecture.

Neural Network Architecture

The presented terms are defined as:

Activation Functions

Are functions that, e.g., allow for classification of certain objects. Functions relating to the activation function are:

Activation(Layer+1) = activation function( W(Layer) * a(Layer) + b(Layer) )
Activation: softmax
With activation function: sigmoid
With W: weight
With a: activation
With b: bias

Back-Propagation is calculating the reverse.

Fitting and Regularization

The major goal of training deep neural networks is the avoidance of over- or underfitting. This can be optimized by initialization, and the use of stochastic gradient descent.


The Learning Process: Stochastic Gradient Descent

Learning Rate

Task Overview NLP

This table gives an overview over the Status Quo deep learning applications in the field of Natural Language Processing.


Try some out by yourself

Colab on Text Generation

Where are we heading?

The next step unavoidably is AGI… Google DeepMind is researching on it, and it is super cool.

Related Terms: