Convolutional Neural Networks are deep neural networks that are primarily used for computer vision. CNNs are a specific architecture of Neural Networks that are incredibly effective at dealing with image data. If we train the CNN model with a dog’s image, it identifies dog characteristics such as its color, tail, and legs. Using these features, the model can differentiate between a dog and any other animal.
Artificial Neural Networks can also be used for image recognition, but there are certain issues using ANN models for image data. If we use ANNs for fully connected networks for image data, we will end up with many parameters. Another problem is, ANNs flatten out data before feeding it into the network. If we flatten out the data, we lose all the two-dimensional information of the data sets. Lastly, ANNs perform well where images are extremely similar. CNN can alleviate all these issues with its layers.
A computer reads images as a matrix of (NxNx3), height, width, and depth. Images are formed by a combination of red, blue, and green colors, so the depth of an image is always 3. A convolutional layer is created when we apply multiple image filters to the input. These image filters also identify different colors in an image. The layer will be trained to figure out the best filter weight values for the network. CNN also helps in reducing parameters by focusing on local connectivity. And in a convolutional layer, not all neurons will fully be connected. Instead, neurons are only connected to a subset of local neurons in the next layer, and these neurons become the filters. We can have as many filters as we want in a convolutional layer. Often convolutional layers are fed into another convolutional layer. This allows the network to discover patterns within patterns, with more complexity for a later convolutional layer. This makes the model differentiate between a cat and a dog despite both the animals having the same color and characteristics.
An activation function is a node used at the end or between neural networks, which helps to learn complex patterns in the data. It takes the output signal from a previous cell and converts it onto some part that can be taken as input to the next cell. There are many activation functions, and the commonly used activation function is ReLu – Rectified Linear Unit.
Even with local connectivity, when dealing with color images and possibly ten or more filters, we’ll have many parameters. We can use pooling layers to reduce this. Pooling layers accept convolutional layers as input. A pooling layer performs sub-sampling or down-sampling operation to the input received from convolutional layers. There are different types of pooling layers, and they use different sub-sampling or down-sampling techniques.
Fully Connected Layer
The last type of layer which a CNN has is a fully connected layer. The fully connected layer is a function that is at the end of the network and fully connects the results from the previous layer. And then, it has an output layer which has a number of neurons equal to the number of classes of any given application.
We can have any combination of convolution layers and pooling layer.
The basic structure of a Convolution Neural Network has a single convolution, pooling, and fully connected layer.
Applications of CNNs
There are many applications with CNNs like ….
- Facial Recognition
- Analyzing Documents
- Historic and Environmental Collection
- Understanding Climate
- Grey Areas
- Medical Industry
Let us look at facial recognition in more detail.
CNNs use the Computer Vision algorithm to identify, interpret and understand the visual world. Facial Recognition models identify and verify people in a picture of their faces. Humans can perform this task with ease, but machines are not as accurate as we are. Nevertheless, by training the model with more data sets, machines outperform humans. We can train the model by giving pictures of you and your friend. When you provide new pictures, the model accurately identifies you and your friend.
The Face Recognition model has a process of 4 steps.
Face detection is the key to building face recognition models. We need to build face detection models or use pre-built models like Haar Cascades, HOG classifiers, DNN Frontal Face detectors. Face Detection models locate faces in an image and draw a box around the faces.
Face alignment can be considered as normalization of data sets just the same way we normalize numerical data before training the model. In this process, the model identifies the geometric alignment of faces. It can also determine the shape of the face, eyes, nose, etc.
In this process, the model extracts the features of the face such as the shape of face, eyes, nose or location of nose, eyes, etc. This extraction of features is used for recognition.
This is the final step of the process where the model matches the faces with the faces present in its prepared database and gives its prediction.