Convolutional Neural Network
Basics
CNNs are designed to process data in the form of multiple arrays, such as a color image made up of three 2D arrays storing pixel intensities in each of the three color channels. Several data modalities have numerous arrays: 1D for signals and sequences, including language; 2D for images or audio spectrograms; and 3D for video or volumetric images.
CNNs are based on the concept of convolution, which is a mathematical procedure that combines two functions to form a third function that expresses how the first is modified by the second.
CNNs have gained popularity in recent years due to their outstanding performance in picture categorization, object identification, and other computer vision applications. They are utilized in many applications, such as self-driving automobiles, facial recognition systems, and medical imaging.
History
CNNs have been around since the early 1980s, when Yann LeCun, a French computer scientist, started working on a neural network design that could recognize handwritten digits. His early work resulted in the development of the LeNet-5 architecture, which was used to read zip codes, cheques, and other handwritten data.
However, CNNs were not widely recognized until the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012, when a team led by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton achieved a dramatic improvement in image classification accuracy by using a deep convolutional neural network known as AlexNet.
Architecture
A typical CNN architecture is organized as a sequence of stages. The initial stages are made up of two kinds of layers:
- convolutional layers
- pooling layers
Make use of the fact that many natural signals are compositional hierarchies, with higher-level properties derived by assembling lower-level ones.
- Local combinations of edges produce motifs in images, motifs assemble into parts, and parts form objects. In speech and text, similar hierarchies exist, ranging from sounds to phones, phonemes, syllables, words, and sentences.
- A crucial component of constructing a convolutional neural network, analogous to a hidden layer.
- Converts input to a more abstract level representation.
- Calculations between the input and the hidden neurons are performed using local connectivity.
- Slides over the input with at least one kernel, conducting a convolution operation between each input region and the kernel.
- The results are saved in activation maps, which can be viewed as the convolutional layer's output. Each kernel can function as a feature extractor, sharing its weights with all neurons.
The following are the primary characteristics of the convolution process:
1. Kernel size - Each kernel has a window size, which is also known as the receptive field. The kernel will conduct a convolution operation on a region matching its window size from the input and store the results in its activation map.
2. Stride - the number of pixels the kernel will travel to the next place. If it is set to 1, each kernel will perform convolution operations around the input volume before shifting one pixel at a time until it reaches the chosen input border. As a result, the stride can be utilized to reduce the dimension of the activation maps (the larger the stride, the smaller the activation maps).
3. Zero-padding - define how many zeros should be padded around the input's border. This is highly beneficial for retaining the input dimension.
CNN Layers
In order to extract characteristics from input data, convolutional neural networks (CNNs) combine a variety of layer types. These are the primary categories of layers that CNN architectures often employ:
- CNNs' primary structural component is the convolutional layer. Convolution operations are carried out by applying filters (sometimes referred to as kernels) to the input data. A particular pattern or feature in the input is picked up by each filter. Images and other grid-like data can benefit from convolutional layers, which can assist them capture spatial relationships.
- Pooling Layer: By using pooling layers, significant features are preserved while the input's spatial dimensions (width and height) are reduced. The most popular pooling method is called max pooling, which chooses the highest value available in a sliding window. Pooling makes the learned characteristics more resistant to changes in position and reduces the computational complexity of the network.
- Activation Layer: A network's activation layer introduces non-linearity. The output of a prior layer is subjected to an element-by-element activation function. The Rectified Linear Unit (ReLU), which reduces negative values to zero while maintaining positive values, is the most often used activation function. CNNs can simulate complicated relationships in the data because activation layers introduce non-linearities.
- Dense layers or fully connected layers link every neuron in the current layer to every neuron in the layer above them. In order to convert the learnt features into the necessary output format (for example, class probabilities in classification tasks), they are often employed at the end of a CNN. The network can generate predictions based on the learned representations thanks to fully connected layers.
- Dropout is a regularization method that is used to avoid overfitting. During training, dropout layers randomly set a portion of the input units to zero, preventing the network from becoming overly dependent on particular features and promoting the learning of more reliable representations.
- These are CNN architectures' foundational layers. Depending on the architecture and task at hand, these layers may be arranged and combined in different ways. In order to give CNNs its characteristic hierarchical structure, many convolutional layers are often followed by pooling, activation, and fully connected layers. This structure enables CNNs to learn progressively more complicated and abstract features from the input data.
CNN Calculations with an Example
An example of computing values in the activation map with a stride of 1 and a zero-padding of 0. The kernel moves one pixel at a time from left to right, beginning at the left top position and progressing to the border. Once at the border, the kernel returns to the second row and repeats the procedure until the entire input is covered.
Pooling Layer
- Typically placed between a convolutional layer and the layer after it.
- Attempt to reduce the size of the input by preserving as much information as possible.
- Is capable of introducing spatial invariance into the network, which can aid in model generalization.
- Scan the entire input with the chosen pooling window size in the same way that a convolutional layer's kernel does.
- Hyperparameters, stride, zero-padding, and pooling window sizes are used.
- Pooling techniques include max-pooling, averaging pooling, min-pooling, fractional max-pooling, and stochastic pooling.
Fully-Connected Layer
- Feedforward networks' fundamental hidden layer unit.
- Added before the output layer to further model the input features' non-linear interactions.
- Alternative strategies include max-over-time pooling, which can take the place of linear layers.
Working
CNNs operate by applying a collection of learned filters to an input image or data to extract features at various levels of abstraction. In a process known as convolution, the filters slide across the image, producing a feature map that highlights portions of the image that are relevant to the task at hand.
These feature maps are then processed through a variety of layers, including pooling and activation layers, to lower the data's dimensionality and boost its non-linearity. The network's final output is a set of probabilities representing the likelihood that the input image belongs to each class in the dataset.
Applications
CNNs (Convolutional Neural Networks) have a wide range of applications in computer vision and image processing jobs. These are a few examples of frequent CNN applications:
- Image Classification: One of the most prominent applications of CNNs is image classification, in which the aim is to categorize an image into one of several predetermined categories. CNNs may learn to recognize visual patterns and properties useful for classification, such as edges, forms, and textures.
- Object Detection: CNNs are also used for object detection, which aims to recognize and localize objects inside an image. This is especially beneficial in applications like autonomous driving, where the system must recognize and track other vehicles, pedestrians, and obstacles in real time.
- CNNs are used for facial recognition, which is the process of recognizing and verifying a person's identification based on their facial features. Security, law enforcement, and access control systems can all benefit from this.
- CNNs are used in medical imaging to analyzing MRI and CT data in order to discover abnormalities and diagnose diseases. They are also employed in image segmentation, which is the process of separating a picture into different sections in order to identify certain structures or tissues.
- Natural Language Processing (NLP): CNNs are also employed in NLP applications such as text categorization and sentiment analysis. They can learn to recognize patterns and properties in text data, such as word frequencies and grammatical structures, and use them for classification and prediction tasks.
- CNNs are employed in video analysis, with the purpose of recognizing and tracking objects, people, and events in a video stream. This can be used for surveillance, traffic monitoring, and sports analysis.
FAQ's
1. What is convolutional neural network (CNN)?
A) A deep learning model called a convolutional neural network (CNN) is made specifically for processing grid-like data, including photographs. It draws inspiration from the way the human brain processes visual information. Convolutional layers are used by CNNs to draw up hierarchical representations and local characteristics from the input data.
2. How does the CNN architecture function?
A) Convolutional layers, pooling layers, activation layers, and fully linked layers are among the layers that generally make up a CNN architecture. The convolutional layers apply filters to the incoming data and use convolutional operations to extract features. By pooling layers, one can keep significant features while reducing the spatial dimensions. Fully connected layers generate predictions based on learnt representations, while activation layers contribute non-linearities.
3. What are the benefits of employing a CNN?
A) CNNs have a number of benefits for deep learning:
- Without the requirement for manual feature engineering, CNNs automatically identify pertinent features from the unprocessed input data.
- They are suitable for image and video analysis jobs because they are good at capturing spatial relationships and patterns in grid-like data.
- CNNs have the ability to learn hierarchical representations, which enables them to model abstract and complex aspects.
- Convolutional layers that use weight sharing have fewer parameters, which increases the computational efficiency of CNNs.
4. What are some of the most common CNN architectures?
A) There have been several widely used CNN architectures created, each with unique design decisions and applications. Among the notable architectures are:
- LeNet-5: An early CNN made specifically for reading handwritten numbers.
- AlexNet: Launched in 2012, it won first place in the ImageNet competition, making CNNs more well-known.
- VGGNet: It has numerous variations (such as VGG16 and VGG19) and is renowned for its simplicity and homogeneous architecture.
- ResNet, which stands for Residual Network, developed the idea of residual connections as a solution to the vanishing gradient issue and to make it possible to train very deep neural networks.
- InceptionNet, also referred to as GoogLeNet, introduced the inception module, which executes several convolutions concurrently.
- MobileNet: Designed with minimal computational resource mobile and embedded devices in mind.
- EfficientNet: By automatically scaling the network architecture across several dimensions, state-of-the-art performance was attained.
5. In what ways are CNNs trained?
A) A loss function and labeled training data are used to train CNNs. In order to reduce the loss, optimization algorithms like stochastic gradient descent (SGD) are used to iteratively update the model's parameters (weights and biases). Backward propagation (backpropagation), which updates the parameters based on gradients, is used in the training process after forward propagation to compute the loss.
6. What activities can CNNs be used for?
A) The majority of activities utilizing grid-like data that CNNs are utilized for include:
- Identifying a label for an image from a list of predetermined categories is known as image classification.
- Finding and locating various items in an image is known as object detection.
- Picture segmentation is the process of putting a class label on each pixel in a picture to distinguish between various objects or regions.
- Image generation is the creation of fresh images using recognized patterns and attributes.
- Analyzing and comprehending videos, including work like action recognition and captioning videos.
An Easy way to remember
- CNNs have been used successfully for the detection, segmentation, and recognition of objects.
- Regions in images, such as face recognition.
- Medical image analysis, such as brain cancer detection.
- Document analysis, such as handwriting.
- Understanding climate.
- Advertising, such as programmatic buying and data-driven personalized advertising.
- Additional applications include driverless automobiles, robots that can mimic human behavior, assistance with human genome mapping studies, and earthquake and natural disaster prediction.
Conclusion
CNNs are commonly used in Deep Learning and in Machine Learning is a type of deep neural network that has transformed the field of computer vision. They use the principle of convolution to extract relevant features from high-dimensional data such as photos and movies. CNNs, with their excellent performance and versatility, are likely to remain a key field of study and development in the next years. CNNs are utilized for a variety of tasks, including image classification, object detection, facial recognition, medical imaging, natural language processing, and video analysis. They are an effective tool for evaluating and extracting relevant patterns and features from high-dimensional data.
Reference:
[1] Krizhevsky et al., Proc. Advances in Neural
Information Processing Systems 2012.
[2] Le Cun al., Nature 2015
[3] wikipedia.com