Convolution Neural Network
Before heading on to the CNN Layers, please read the article for an in-depth description of the CNN (Convolution Neural Network), including its history, architecture, applications, and working.
Basics
For an example, A laser sensor that produces a single output, x(t), the position of the object at time t, is being used to track the location of an object.
Think of a loud laser sensor. We can average numerous observations together to create a less noisy estimate of the position, where more recent data are more important (a weighted average that gives recent measurements more weight, w(a)).
When we perform such a weighted average operation at each instant, we create a new function s that provides a smoothed estimate of the object's position:
Typically, an operation is indicated as:
The input is referred to as x.
The kernel is denoted by w, the feature map by s, and the number n by n Since t is discretized in practical implementations, we can write the operation as follows:
Kernels
A tiny matrix called the kernel is utilized in a convolution neural network (CNN) to extract information from the input image. The kernel is often referred to as a convolutional filter or a filter.
CNN Architecture(Source: Wikipedia)
A CNN performs element-wise multiplication between the kernel and the local region of the input image that it is currently aligned with during the convolution operation as the kernel slides over the input picture. The kernel and the local region of the input image are then dot-produced to produce a scalar value. A new feature map is produced by repeating this process for each area of the input image that the kernel can fit into.
The weights in the kernel matrix are taught to the CNN during training. To extract the most pertinent characteristics from the input image for the specified task, such as object recognition or image segmentation, the training procedure aims to discover the ideal set of weights in the kernels.
CNNs employ particular horizontal and vertical kernel types for a variety of purposes, including edge and line detection.
Horizontal Kernel
A kernel with a single row and numerous columns is referred to as a horizontal kernel. It is frequently employed to find horizontal borders in images. To detect changes in brightness levels along the horizontal direction, the values in the horizontal kernel are typically positive in the center row and negative in the rows above and below.
An illustration of a 3x3 horizontal kernel definition is:
[[-1 -1 -1]
[ 0 0 0]
[ 1 1 1]]
Vertical Kernel
On the other hand, a vertical kernel has a single column and numerous rows. Typically, it is employed to find vertical edges in pictures. It is frequently possible to notice variations in brightness levels along the vertical direction because the values in the vertical kernel are often positive in the center column and negative in the columns to its left and right.
For instance, the following can be used to define a vertical kernel of size 3x3:
[[-1 0 1]
[-1 0 1]
[-1 0 1]]
Edge detection at various orientations in an image can be accomplished by combining horizontal and vertical kernels. We can determine the size and direction of the edges at each pixel in an input image by applying both horizontal and vertical kernels. This information may be utilized to extract features and carry out tasks like image segmentation and object recognition.
CNN Layers
Important characteristics of the convolution process
- Size of Kernels
- Strides
- Zero Padding
- Size of Kernels: The receptive field, often known as the window size, is a property of each kernel. In order to create results in its activation map, the kernel will perform a convolution operation on an area from the input that matches the size of its window.
- Stride: The kernel's step size, Stride, determines how many pixels it will advance to the following place. Each kernel will perform convolution operations around the input volume if it is set to 1, and then shift one pixel at a time until it hits the input's designated border. As a result, the dimension of the activation maps can be reduced using the stride (the larger the stride, the smaller the activation maps).
- Zero padding: Determine how many zeros one wants to add to the input's border by using the zero-padding option. This is excellent for maintaining the input's dimension.
Sparse Connectivity
- Making the kernel smaller than the input results in sparse interactions, sometimes referred to as sparse connectedness or sparse weights.
- Less parameters are needed to be stored, which lowers the model's memory requirements and boosts its statistical effectiveness.
- The output computation needs fewer operations.
- Units in the deeper layers of a deep convolutional network may indirectly interact with more of the input.
In a CNN, the receptive field of the units in the deeper layers is greater than that of the units in the shallow layers. Even though a convolutional net has very few direct connections, units in the deeper layers may be indirectly related to all or the majority of the input image.
Pooling Layer
A pooling function substitutes a summary statistic of the adjacent outputs for the output of the net at a certain position.
- Work on minimizing the size of the input by preserving as much data as you can.
- Uses a pooling window size, zero-padding, and stride as hyperparameters.
- Use the same pooling window size as the kernel in a convolutional layer to scan the whole input.
Different types of pooling include:
- Max-pooling-which maximizes a neighborhood's value
- Averaging pooling-which averages out a neighborhood's value
- Min-pooling-which minimizes a neighborhood's value
- Stochastic pooling-Use size-biased sampling with fractional max-pooling and non-integer layer sizes.
Small translations of the input into the network can be made spatially invariant by the pooling layer, which can enhance the model's generalizability.
If we are more interested in determining whether a feature is there than in determining exactly where it is, in variance to local translation might be a very helpful property.
Implementation
Convolution is a concept that is somewhat (but not exactly) connected to signal processing, It is only an operation in deep learning where a kernel conducts element-wise multiplication on a portion of an input tensor and then adds the results to produce a single value. Despite being clearly stated, it's not that simple to understand!
Let's attempt to walk through instances of essential CNN concepts so that we can attempt to construct something amusing from it.
- Step 1: First, we must import the required libraries, including Matplotlib, Keras, and Numpy.
import numpy as np
import keras
import matplotlib.pyplot as plt
- Step 2: Load the dataset.
The dataset that we intend to use for training and testing our CNN model must then be loaded. For computer vision tasks, a variety of datasets are available, including CIFAR-10, MNIST, and ImageNet. We'll use the CIFAR-10 dataset for this example, which has 60,000 32x32 color images divided into 10 classes with 6,000 images each.
from keras.datasets import cifar10
(train_images, train_labels), (test_images, test_labels) = cifar10.load_data()
- Step 3: Prepare the data.
We must preprocess the data before we can use it to train our CNN model. By converting the labels to a categorical format, we can normalize the pixel values to be between 0 and 1.
train_images = train_images / 255.0
test_images = test_images / 255.0
train_labels = keras.utils.to_categorical(train_labels, 10)
test_labels = keras.utils.to_categorical(test_labels, 10)
- Step 4: Create the CNN model.
To build the CNN model, we can utilize the Sequential model from the Keras toolkit.
Three convolutional layers with max pooling layers in between were employed in this model. After flattening the output, a fully linked layer with a dropout regularization layer is applied to it. Finally, we assign the image to one of the 10 categories using a softmax activation function.
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
- Step 5: Put the model together.
The model must then be assembled by describing the evaluation metric, optimizer, and loss function.
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
- Step 6: Evaluate the model, Now that we have our CIFAR-10 dataset, we can train the model.
model.fit(train_images, train_labels, epochs=5, batch_size=64)
- Step 7: Test the model.
Obtained Output:
Epoch 1/5 782/782 [==============================] - 87s 112ms/step - loss: 1.4484 - accuracy: 0.4720 Epoch 2/5 782/782 [==============================] - 83s 106ms/step - loss: 1.1295 - accuracy: 0.5988 Epoch 3/5 782/782 [==============================] - 83s 106ms/step - loss: 0.9665 - accuracy: 0.6603 Epoch 4/5 782/782 [==============================] - 83s 106ms/step - loss: 0.8564 - accuracy: 0.7014 Epoch 5/5 782/782 [==============================] - 85s 109ms/step - loss: 0.7838 - accuracy: 0.7270
<keras.callbacks.History at 0x7f880f99c4c0>
test_loss, test_acc = model.evaluate(test_images, test_labels)
print('Test accuracy:', test_acc)
Finally, we can evaluate our model's performance using the test set.In this section, we assess the model against the test set and print the test accuracy. Obtained Output:
313/313 [==============================] - 7s 21ms/step - loss: 0.8529 - accuracy: 0.7065
Test accuracy: 0.7064999938011169
model.summary()
Plot_Model will display the summary of the model in the form of a workflow visualization. Note that the implementation described above is merely an example and can be changed depending on the particular specifications of the issue.
Obtained Output:
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_4 (Conv2D) (None, 30, 30, 32) 896
max_pooling2d_4 (MaxPooling (None, 15, 15, 32) 0
2D)
conv2d_5 (Conv2D) (None, 13, 13, 64) 18496
max_pooling2d_5 (MaxPooling (None, 6, 6, 64) 0
2D)
conv2d_6 (Conv2D) (None, 4, 4, 128) 73856
max_pooling2d_6 (MaxPooling (None, 2, 2, 128) 0
2D)
flatten_1 (Flatten) (None, 512) 0
dense_2 (Dense) (None, 512) 262656
dropout (Dropout) (None, 512) 0
dense_3 (Dense) (None, 10) 5130
=================================================================
Total params: 361,034
Trainable params: 361,034
Non-trainable params: 0
_________________________________________________________________
from keras.utils.vis_utils import plot_model
plot_model(model, show_shapes=True, show_layer_names=True)
Obtained Output:Key Points to Remember
There are numerous important considerations to keep in mind while using Convolutional Neural Networks (CNNs) in deep learning. Here are a few significant factors:
- CNNs mainly employ convolutional layers, which apply filters to input data to find regional patterns and features. In photos and other structured data, these layers are good at capturing spatial relationships.
- Convolutional layer feature maps are downsampled using pooling layers, which reduces the spatial dimensions while keeping key characteristics. Max pooling and average pooling are two frequently used pooling procedures.
- Activation Functions: The network can learn complicated associations because activation functions add non-linearity into the system. Rectified Linear Unit (ReLU) and its variations, such as Leaky ReLU and Parametric ReLU, are often used activation functions in CNNs.
- Layers That Are Fully Connected: CNNs frequently have one or more fully connected layers. These layers enable advanced feature extraction and categorization by connecting each neuron from one layer to the next.
- Loss Functions: Loss functions, which measure the discrepancy between predicted and actual values, are frequently used to train CNNs. Depending on the task at hand, a loss function may be selected, such as cross-entropy for classification or mean squared error for regression.
- Backpropagation is a technique used by CNNs to determine the gradients of the loss function with respect to the network parameters. After that, the network's weights and biases are modified utilizing optimization techniques like stochastic gradient descent (SGD) or its derivatives (such Adam, RMSprop).
- Techniques for Regularization: In deep learning, overfitting is a frequent problem. To avoid overfitting and enhance generalization, regularization techniques like dropout and weight decay are frequently used.
- To enhance CNNs and lessen overfitting, data augmentation techniques are applied. These techniques alter the present data by rotating, translating, flipping, and zooming it to provide new training instances.
- Transfer Learning: To tackle novel problems with few labeled data, transfer learning uses pre-trained CNN models on huge datasets, like ImageNet. This method can reduce calculation time and resources while assisting in the transfer of learned features.
- Tuning CNN hyperparameters includes adjusting the learning rate, batch size, number of layers, filter sizes, and other factors. It is essential to properly tune these hyperparameters in order to get optimal performance and convergence.
Conclusion
We can experiment with various hyperparameters, such as the learning rate, batch size, and number of epochs, or we can attempt various architectures, such as include additional convolutional layers or utilizing various regularization methods, to improve the performance of the model.
Overall, the CNN model we constructed offers a decent starting point for image classification tasks on the CIFAR-10 dataset and may be customized for additional image classification tasks.
References