A Primer On Neural Networks


Any system has certain number of inputs, certain number of outputs and some relationship in between. A generic system with m inputs and p outputs can be represented as a function:

f (x1, x2, … , xm) = (y1, y2, … , yp)

Let us say that you are given a dataset of such inputs and their corresponding outputs, you can discover the relationship in between using various techniques of machine learning. One such technique is neural networks. A generic neural network is called as deep neural network. It represents the system and discovers the relationship leveraging iterative method as illustrated here. The dataset used for discovering the relationship is called as training data. The discovered relationship is called as trained model. The trained model is used for making predictions which is to compute outputs for new inputs which are not already there in the training data.


A deep neural network represents such a system as:

Deep Neural Networks

It is a neural network with k layers. It has an input layer of m inputs. It has k-1 hidden layers. The first hidden layer has n neurons. The last hidden layer has i neurons. The output layer has p neurons. There are weights w associated with every edge of the network. Weight for each edge is identified by

Edge Identifierwhere the superscript k identifies the layer number in the network to which the edge belongs; the subscript is given by two numbers i and j, the first number i gives neuron number in the layer on the source side of the edge whereas the second number j gives neuron number in the layer on the target side of the edge. Each neuron in the hidden and output layers computes linear weighted average Σ, followed by nonlinear activation ʃ of the input values to the neuron. The summation is computed as:

 Σ i = 1i = m xi * wjil

Various Sigmoid Funtions

For every jth neuron in the lth layer of the network, where xi is the ith input to the neuron over all the m inputs to the neuron. The summation is followed by some nonlinear activation function like sigmoid or relu. This nonlinear activation function gives the neural network an ability to compute a very complex function by stacking multiple layers. The following diagram shows how normalized versions of various sigmoid function maps values on X axis in the range (-∞, ∞) to values on the Y axis in the range (-1, 1):

To give a glimpse of the non-linearity that it introduces in the  equation.

These w’s are often called as parameters of the neural network.


Here is typical iterative neural network algorithm:

iterative neural network algorithm

A commonly popular algorithm is called Gradient Descent.

Some Specialized Neural Networks

Convolutional Neural Networks (CNN or ConvNet)

These are typically used for two or three dimensional images or on spectrograms of audio samples. They leverage the notion of proximity of pixels in the images. They consist of layers like convolutional, pooling, fully connected and normalization.


Recurrent neural networks (RNN)

These are typically used for handwriting or speech recognition, Natural Language Processing (NLP). They leverage some internal memory while computing the output at any layer rather than looking at only the previous layer. They are sometimes also called as LSTM (Long short-term memory) networks.

Capsule Network (CapsNet)

They address limitations in CNN while dealing with images having rotation, tilt or different orientations. Capsule is a nested set of neural layers. It outputs a vector instead of a scalar in earlier neural networks.