ResNet
Introduction
The deep neural network architecture known as Residual Network, or simply ResNet, was first presented in 2015 by Kaiming He et al. ResNet is renowned for being able to train networks that are considerably deeper than was previously conceivable by circumventing the issue of vanishing gradients that arises when training very deep neural networks.
Utilizing residual connections, which enable the network to skip some layers and send information straight from one to the next, is the main principle of ResNet. In essence, the network must learn only the differences between the input and output of a given set of layers rather than the entire mapping from input to output. As a result, the gradients don't get too small as they travel back through the layers, making it simpler to train very deep networks.
In computer vision applications, ResNet is frequently utilized, especially for picture categorization and object detection tasks. On several benchmark datasets, it has been demonstrated to perform better than other cutting-edge architectures. ResNet also comes in a variety of forms with different depths and numbers of layers, such as ResNet-50, ResNet-101, and ResNet-152.
History
ResNet's beginnings can be found in the early stages of deep learning research, when scientists were attempting to create neural network architectures that could do intricate tasks with high accuracy, such image identification.
The issue of
vanishing gradients, which arises when the gradients used to update the weights during training get very small as they travel backwards through the network, is one of the difficulties faced by deep neural networks. Due to the gradients potentially become too small to properly update the lower layers' weights, it may be challenging to train very deep networks.
Researchers started experimenting with various sorts of architectures that could allow the training of deeper networks to solve this issue. Using skip connections, which enable the network to skip some layers and propagate information directly from one layer to the next, was one such strategy.
Bengio et al. first established the idea of skip connections in a 1998 study, but ResNet's debut in 2015 marked the beginning of its widespread adoption. ResNet, a technique for training very deep neural networks utilizing residual connections—a form of skip connection that enables the network to learn the residual function between the input and output of a set of layers—was proposed by Kaiming He et al.
In the original ResNet study, which produced cutting-edge results on a number of benchmark datasets, it was shown that the introduction of residual connections might allow for the training of far deeper networks than was previously feasible. Since then, ResNet has gained popularity and is frequently used in
deep learning research and applications. It also served as a catalyst for a subsequent investigation into several other types of architectures that can facilitate the training of even deeper neural networks.
Architecture
Based on the idea of residual connections, the ResNet architecture enables the network to skip some layers and transport data straight from one layer to the next. Each block of layers in the design typically consists of numerous
convolutional layers, followed by a number of residual connections.
The residual block, which consists of two or three convolutional layers followed by a set of residual connections, is the fundamental building component of the ResNet architecture. Instead of learning the entire mapping from input to output, the network can learn the differences between the input and output of the block thanks to the residual connections.
Depending on the specific kind being utilized, ResNet's overall design can vary, but typically it consists of several layers of residual blocks, followed by a layer for pooling global averages and a fully connected layer for classification. Some modifications also add extra layers, like
batch normalization or dropout layers, to improve performance and prevent
overfitting.Residual Block
- F(x) is the desired underlying mapping that we are trying to learn in order to serve as the input for the activation function above.
- The box with the dotted line must learn the residual mapping f(x) - x.
- A residual connection is the solid line that carries the layer input x to the addition operator.
- Two 3x3 convolutional layers with the same number of output channels make up the n Block. A batch normalizing layer and a ReLU activation function are then applied on top of it.
ResNet-50, one of the most well-known ResNet variants, contains 50 layers and was utilized in 2015's ImageNet Large Scale Visual Recognition Challenge (ILSVRC) to attain cutting-edge performance. Two further typical versions of the architecture are ResNet-101 and ResNet-152, which are more intricate iterations.
Overall, it has been demonstrated that the ResNet design performs exceptionally well for computer vision tasks, making it one of the most popular architectures in deep learning research and applications.
Working
In order to train incredibly deep neural networks, the ResNet design makes use of residual connections. In order to avoid the
gradients getting too small as they propagate backwards through the network during training, these residual connections enable the network to skip over some layers and immediately convey information from one layer to another.
The fundamental principle of ResNet is that the network learns the residual function, which is the difference between the input and output of a series of layers, as opposed to learning a complete mapping from input to output. As a result, the network can concentrate on learning only the challenging portions of the mapping, while the residual connections can teach the network the simpler portions.
Back propagation is a technique used by the ResNet design to determine the gradients of the loss function with respect to the network weights during training. The network's weights are subsequently changed using these gradients with the goal of lowering the
loss function and improving the network's performance on the specified task.
The ResNet architecture has shown to be highly successful for computer vision applications like image classification and object recognition. Due to its ability to train extremely deep neural networks, ResNet has grown to be one of the most widely used designs in deep learning research and applications, helping to advance the state-of-the-art in these tasks.
Applications
For computer vision applications including image classification, object recognition, and semantic segmentation, the ResNet architecture has been extensively employed. Here are a few examples of ResNet's particular uses:
- Image Classification: ResNet performed at the cutting edge on the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), among other benchmark datasets for image classification.
- Object detection models like Faster R-CNN, Mask R-CNN, and YOLOv3 have all employed ResNet as their underlying architecture. On the COCO object detection dataset, these models performed at the cutting edge.
- ResNet has been used for semantic segmentation tasks, in which a class label is intended to be assigned to each pixel in an image. Modern performance has been attained by ResNet-based models on datasets like Cityscapes and PASCAL VOC.
- ResNet has been employed as a pre-trained feature extractor for transfer learning, in which a model is initially trained on a huge dataset and then fine-tuned on a smaller dataset for a particular task. On a variety of computer vision tasks, state-of-the-art performance has been achieved using this method.
Overall, the ResNet design has proven to be quite successful for a variety of computer vision tasks, making it one of the most popular architectures in deep learning research and applications.
Implementation
Using the TensorFlow Keras API, the ResNet50 model architecture has been implemented. For image classification tasks, a deep neural network design called ResNet50, which contains 50 layers, is frequently utilized.
Source Code
# Import the necessary Libraries
import tensorflow as tf
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, GlobalMaxPooling2D, Dense, Add
from tensorflow.keras.models import Model
# Define the blocks and the filters using the CNN layers
def identity_block(x, filters):
f1, f2, f3 = filters
x_shortcut = x
x = Conv2D(filters=f1, kernel_size=(1, 1), strides=(1, 1), padding='valid')(x)
x = tf.keras.layers.BatchNormalization(axis=3)(x)
x = tf.keras.layers.Activation('relu')(x)
x = Conv2D(filters=f2, kernel_size=(3, 3), strides=(1, 1), padding='same')(x)
x = tf.keras.layers.BatchNormalization(axis=3)(x)
x = tf.keras.layers.Activation('relu')(x)
x = Conv2D(filters=f3, kernel_size=(1, 1), strides=(1, 1), padding='valid')(x)
x = tf.keras.layers.BatchNormalization(axis=3)(x)
x = Add()([x, x_shortcut])
x = tf.keras.layers.Activation('relu')(x)
return x
def conv_block(x, filters, strides):
f1, f2, f3 = filters
x_shortcut = x
x = Conv2D(filters=f1, kernel_size=(1, 1), strides=strides, padding='valid')(x)
x = tf.keras.layers.BatchNormalization(axis=3)(x)
x = tf.keras.layers.Activation('relu')(x)
x = Conv2D(filters=f2, kernel_size=(3, 3), strides=(1, 1), padding='same')(x)
x = tf.keras.layers.BatchNormalization(axis=3)(x)
x = tf.keras.layers.Activation('relu')(x)
x = Conv2D(filters=f3, kernel_size=(1, 1), strides=(1, 1), padding='valid')(x)
x = tf.keras.layers.BatchNormalization(axis=3)(x)
x_shortcut = Conv2D(filters=f3, kernel_size=(1, 1), strides=strides, padding='valid')(x_shortcut)
x_shortcut = tf.keras.layers.BatchNormalization(axis=3)(x_shortcut)
x = Add()([x, x_shortcut])
x = tf.keras.layers.Activation('relu')(x)
return x
# Define the ResNet 50 CNN block and its layers
def resnet50(input_shape=(224, 224, 3), classes=1000):
x_input = Input(input_shape)
x = Conv2D(filters=64, kernel_size=(7, 7), strides=(2, 2), padding='valid')(x_input)
x = tf.keras.layers.BatchNormalization(axis=3)(x)
x = tf.keras.layers.Activation('relu')(x)
x = MaxPooling2D(pool_size=(3, 3), strides=(2, 2))(x)
x = conv_block(x, filters=[64, 64, 256], strides=(1, 1))
x = identity_block(x, filters=[64, 64, 256])
x = identity_block(x, filters=[64, 64, 256])
x = conv_block(x, filters=[128, 128, 512], strides=(2, 2))
x = identity_block(x, filters=[128, 128, 512])
x = identity_block(x, filters=[128, 128, 512])
x = identity_block(x, filters=[128, 128, 512])
x = conv_block(x, filters=[256, 256, 1024], strides=(2, 2))
x = identity_block(x, filters=[256, 256, 1024])
x = identity_block(x, filters=[256, 256, 1024])
x = identity_block(x, filters=[256, 256, 1024])
x = identity_block(x, filters=[256, 256, 1024])
x = identity_block(x, filters=[256, 256, 1024])
x = conv_block(x, filters=[512, 512, 2048], strides=(2, 2))
x = identity_block(x, filters=[512, 512, 2048])
x = identity_block(x, filters=[512, 512, 2048])
x = GlobalMaxPooling2D()(x)
x = Dense(classes, activation='softmax')(x)
model = Model(inputs=x_input, outputs=x, name='resnet50')
return model
# Summary of the Model
model = resnet50()
model.summary()
Obtained Output:
Model: "resnet50"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 224, 224, 3 0 []
)]
conv2d (Conv2D) (None, 109, 109, 64 9472 ['input_1[0][0]']
)
batch_normalization (BatchNorm (None, 109, 109, 64 256 ['conv2d[0][0]']
alization) )
activation (Activation) (None, 109, 109, 64 0 ['batch_normalization[0][0]']
)
max_pooling2d (MaxPooling2D) (None, 54, 54, 64) 0 ['activation[0][0]']
conv2d_1 (Conv2D) (None, 54, 54, 64) 4160 ['max_pooling2d[0][0]']
batch_normalization_1 (BatchNo (None, 54, 54, 64) 256 ['conv2d_1[0][0]']
rmalization)
activation_1 (Activation) (None, 54, 54, 64) 0 ['batch_normalization_1[0][0]']
conv2d_2 (Conv2D) (None, 54, 54, 64) 36928 ['activation_1[0][0]']
batch_normalization_2 (BatchNo (None, 54, 54, 64) 256 ['conv2d_2[0][0]']
rmalization)
activation_2 (Activation) (None, 54, 54, 64) 0 ['batch_normalization_2[0][0]']
conv2d_3 (Conv2D) (None, 54, 54, 256) 16640 ['activation_2[0][0]']
conv2d_4 (Conv2D) (None, 54, 54, 256) 16640 ['max_pooling2d[0][0]']
batch_normalization_3 (BatchNo (None, 54, 54, 256) 1024 ['conv2d_3[0][0]']
rmalization)
batch_normalization_4 (BatchNo (None, 54, 54, 256) 1024 ['conv2d_4[0][0]']
rmalization)
add (Add) (None, 54, 54, 256) 0 ['batch_normalization_3[0][0]',
'batch_normalization_4[0][0]']
activation_3 (Activation) (None, 54, 54, 256) 0 ['add[0][0]']
conv2d_5 (Conv2D) (None, 54, 54, 64) 16448 ['activation_3[0][0]']
batch_normalization_5 (BatchNo (None, 54, 54, 64) 256 ['conv2d_5[0][0]']
rmalization)
activation_4 (Activation) (None, 54, 54, 64) 0 ['batch_normalization_5[0][0]']
conv2d_6 (Conv2D) (None, 54, 54, 64) 36928 ['activation_4[0][0]']
batch_normalization_6 (BatchNo (None, 54, 54, 64) 256 ['conv2d_6[0][0]']
rmalization)
activation_5 (Activation) (None, 54, 54, 64) 0 ['batch_normalization_6[0][0]']
conv2d_7 (Conv2D) (None, 54, 54, 256) 16640 ['activation_5[0][0]']
batch_normalization_7 (BatchNo (None, 54, 54, 256) 1024 ['conv2d_7[0][0]']
rmalization)
add_1 (Add) (None, 54, 54, 256) 0 ['batch_normalization_7[0][0]',
'activation_3[0][0]']
activation_6 (Activation) (None, 54, 54, 256) 0 ['add_1[0][0]']
conv2d_8 (Conv2D) (None, 54, 54, 64) 16448 ['activation_6[0][0]']
batch_normalization_8 (BatchNo (None, 54, 54, 64) 256 ['conv2d_8[0][0]']
rmalization)
activation_7 (Activation) (None, 54, 54, 64) 0 ['batch_normalization_8[0][0]']
conv2d_9 (Conv2D) (None, 54, 54, 64) 36928 ['activation_7[0][0]']
batch_normalization_9 (BatchNo (None, 54, 54, 64) 256 ['conv2d_9[0][0]']
rmalization)
activation_8 (Activation) (None, 54, 54, 64) 0 ['batch_normalization_9[0][0]']
conv2d_10 (Conv2D) (None, 54, 54, 256) 16640 ['activation_8[0][0]']
batch_normalization_10 (BatchN (None, 54, 54, 256) 1024 ['conv2d_10[0][0]']
ormalization)
add_2 (Add) (None, 54, 54, 256) 0 ['batch_normalization_10[0][0]',
'activation_6[0][0]']
activation_9 (Activation) (None, 54, 54, 256) 0 ['add_2[0][0]']
conv2d_11 (Conv2D) (None, 27, 27, 128) 32896 ['activation_9[0][0]']
batch_normalization_11 (BatchN (None, 27, 27, 128) 512 ['conv2d_11[0][0]']
ormalization)
activation_10 (Activation) (None, 27, 27, 128) 0 ['batch_normalization_11[0][0]']
conv2d_12 (Conv2D) (None, 27, 27, 128) 147584 ['activation_10[0][0]']
batch_normalization_12 (BatchN (None, 27, 27, 128) 512 ['conv2d_12[0][0]']
ormalization)
activation_11 (Activation) (None, 27, 27, 128) 0 ['batch_normalization_12[0][0]']
conv2d_13 (Conv2D) (None, 27, 27, 512) 66048 ['activation_11[0][0]']
conv2d_14 (Conv2D) (None, 27, 27, 512) 131584 ['activation_9[0][0]']
batch_normalization_13 (BatchN (None, 27, 27, 512) 2048 ['conv2d_13[0][0]']
ormalization)
batch_normalization_14 (BatchN (None, 27, 27, 512) 2048 ['conv2d_14[0][0]']
ormalization)
add_3 (Add) (None, 27, 27, 512) 0 ['batch_normalization_13[0][0]',
'batch_normalization_14[0][0]']
activation_12 (Activation) (None, 27, 27, 512) 0 ['add_3[0][0]']
conv2d_15 (Conv2D) (None, 27, 27, 128) 65664 ['activation_12[0][0]']
batch_normalization_15 (BatchN (None, 27, 27, 128) 512 ['conv2d_15[0][0]']
ormalization)
activation_13 (Activation) (None, 27, 27, 128) 0 ['batch_normalization_15[0][0]']
conv2d_16 (Conv2D) (None, 27, 27, 128) 147584 ['activation_13[0][0]']
batch_normalization_16 (BatchN (None, 27, 27, 128) 512 ['conv2d_16[0][0]']
ormalization)
activation_14 (Activation) (None, 27, 27, 128) 0 ['batch_normalization_16[0][0]']
conv2d_17 (Conv2D) (None, 27, 27, 512) 66048 ['activation_14[0][0]']
batch_normalization_17 (BatchN (None, 27, 27, 512) 2048 ['conv2d_17[0][0]']
ormalization)
add_4 (Add) (None, 27, 27, 512) 0 ['batch_normalization_17[0][0]',
'activation_12[0][0]']
activation_15 (Activation) (None, 27, 27, 512) 0 ['add_4[0][0]']
conv2d_18 (Conv2D) (None, 27, 27, 128) 65664 ['activation_15[0][0]']
batch_normalization_18 (BatchN (None, 27, 27, 128) 512 ['conv2d_18[0][0]']
ormalization)
activation_16 (Activation) (None, 27, 27, 128) 0 ['batch_normalization_18[0][0]']
conv2d_19 (Conv2D) (None, 27, 27, 128) 147584 ['activation_16[0][0]']
batch_normalization_19 (BatchN (None, 27, 27, 128) 512 ['conv2d_19[0][0]']
ormalization)
activation_17 (Activation) (None, 27, 27, 128) 0 ['batch_normalization_19[0][0]']
conv2d_20 (Conv2D) (None, 27, 27, 512) 66048 ['activation_17[0][0]']
batch_normalization_20 (BatchN (None, 27, 27, 512) 2048 ['conv2d_20[0][0]']
ormalization)
add_5 (Add) (None, 27, 27, 512) 0 ['batch_normalization_20[0][0]',
'activation_15[0][0]']
activation_18 (Activation) (None, 27, 27, 512) 0 ['add_5[0][0]']
conv2d_21 (Conv2D) (None, 27, 27, 128) 65664 ['activation_18[0][0]']
batch_normalization_21 (BatchN (None, 27, 27, 128) 512 ['conv2d_21[0][0]']
ormalization)
activation_19 (Activation) (None, 27, 27, 128) 0 ['batch_normalization_21[0][0]']
conv2d_22 (Conv2D) (None, 27, 27, 128) 147584 ['activation_19[0][0]']
batch_normalization_22 (BatchN (None, 27, 27, 128) 512 ['conv2d_22[0][0]']
ormalization)
activation_20 (Activation) (None, 27, 27, 128) 0 ['batch_normalization_22[0][0]']
conv2d_23 (Conv2D) (None, 27, 27, 512) 66048 ['activation_20[0][0]']
batch_normalization_23 (BatchN (None, 27, 27, 512) 2048 ['conv2d_23[0][0]']
ormalization)
add_6 (Add) (None, 27, 27, 512) 0 ['batch_normalization_23[0][0]',
'activation_18[0][0]']
activation_21 (Activation) (None, 27, 27, 512) 0 ['add_6[0][0]']
conv2d_24 (Conv2D) (None, 14, 14, 256) 131328 ['activation_21[0][0]']
batch_normalization_24 (BatchN (None, 14, 14, 256) 1024 ['conv2d_24[0][0]']
ormalization)
activation_22 (Activation) (None, 14, 14, 256) 0 ['batch_normalization_24[0][0]']
conv2d_25 (Conv2D) (None, 14, 14, 256) 590080 ['activation_22[0][0]']
batch_normalization_25 (BatchN (None, 14, 14, 256) 1024 ['conv2d_25[0][0]']
ormalization)
activation_23 (Activation) (None, 14, 14, 256) 0 ['batch_normalization_25[0][0]']
conv2d_26 (Conv2D) (None, 14, 14, 1024 263168 ['activation_23[0][0]']
)
conv2d_27 (Conv2D) (None, 14, 14, 1024 525312 ['activation_21[0][0]']
)
batch_normalization_26 (BatchN (None, 14, 14, 1024 4096 ['conv2d_26[0][0]']
ormalization) )
batch_normalization_27 (BatchN (None, 14, 14, 1024 4096 ['conv2d_27[0][0]']
ormalization) )
add_7 (Add) (None, 14, 14, 1024 0 ['batch_normalization_26[0][0]',
) 'batch_normalization_27[0][0]']
activation_24 (Activation) (None, 14, 14, 1024 0 ['add_7[0][0]']
)
conv2d_28 (Conv2D) (None, 14, 14, 256) 262400 ['activation_24[0][0]']
batch_normalization_28 (BatchN (None, 14, 14, 256) 1024 ['conv2d_28[0][0]']
ormalization)
activation_25 (Activation) (None, 14, 14, 256) 0 ['batch_normalization_28[0][0]']
conv2d_29 (Conv2D) (None, 14, 14, 256) 590080 ['activation_25[0][0]']
batch_normalization_29 (BatchN (None, 14, 14, 256) 1024 ['conv2d_29[0][0]']
ormalization)
activation_26 (Activation) (None, 14, 14, 256) 0 ['batch_normalization_29[0][0]']
conv2d_30 (Conv2D) (None, 14, 14, 1024 263168 ['activation_26[0][0]']
)
batch_normalization_30 (BatchN (None, 14, 14, 1024 4096 ['conv2d_30[0][0]']
ormalization) )
add_8 (Add) (None, 14, 14, 1024 0 ['batch_normalization_30[0][0]',
) 'activation_24[0][0]']
activation_27 (Activation) (None, 14, 14, 1024 0 ['add_8[0][0]']
)
conv2d_31 (Conv2D) (None, 14, 14, 256) 262400 ['activation_27[0][0]']
batch_normalization_31 (BatchN (None, 14, 14, 256) 1024 ['conv2d_31[0][0]']
ormalization)
activation_28 (Activation) (None, 14, 14, 256) 0 ['batch_normalization_31[0][0]']
conv2d_32 (Conv2D) (None, 14, 14, 256) 590080 ['activation_28[0][0]']
batch_normalization_32 (BatchN (None, 14, 14, 256) 1024 ['conv2d_32[0][0]']
ormalization)
activation_29 (Activation) (None, 14, 14, 256) 0 ['batch_normalization_32[0][0]']
conv2d_33 (Conv2D) (None, 14, 14, 1024 263168 ['activation_29[0][0]']
)
batch_normalization_33 (BatchN (None, 14, 14, 1024 4096 ['conv2d_33[0][0]']
ormalization) )
add_9 (Add) (None, 14, 14, 1024 0 ['batch_normalization_33[0][0]',
) 'activation_27[0][0]']
activation_30 (Activation) (None, 14, 14, 1024 0 ['add_9[0][0]']
)
conv2d_34 (Conv2D) (None, 14, 14, 256) 262400 ['activation_30[0][0]']
batch_normalization_34 (BatchN (None, 14, 14, 256) 1024 ['conv2d_34[0][0]']
ormalization)
activation_31 (Activation) (None, 14, 14, 256) 0 ['batch_normalization_34[0][0]']
conv2d_35 (Conv2D) (None, 14, 14, 256) 590080 ['activation_31[0][0]']
batch_normalization_35 (BatchN (None, 14, 14, 256) 1024 ['conv2d_35[0][0]']
ormalization)
activation_32 (Activation) (None, 14, 14, 256) 0 ['batch_normalization_35[0][0]']
conv2d_36 (Conv2D) (None, 14, 14, 1024 263168 ['activation_32[0][0]']
)
batch_normalization_36 (BatchN (None, 14, 14, 1024 4096 ['conv2d_36[0][0]']
ormalization) )
add_10 (Add) (None, 14, 14, 1024 0 ['batch_normalization_36[0][0]',
) 'activation_30[0][0]']
activation_33 (Activation) (None, 14, 14, 1024 0 ['add_10[0][0]']
)
conv2d_37 (Conv2D) (None, 14, 14, 256) 262400 ['activation_33[0][0]']
batch_normalization_37 (BatchN (None, 14, 14, 256) 1024 ['conv2d_37[0][0]']
ormalization)
activation_34 (Activation) (None, 14, 14, 256) 0 ['batch_normalization_37[0][0]']
conv2d_38 (Conv2D) (None, 14, 14, 256) 590080 ['activation_34[0][0]']
batch_normalization_38 (BatchN (None, 14, 14, 256) 1024 ['conv2d_38[0][0]']
ormalization)
activation_35 (Activation) (None, 14, 14, 256) 0 ['batch_normalization_38[0][0]']
conv2d_39 (Conv2D) (None, 14, 14, 1024 263168 ['activation_35[0][0]']
)
batch_normalization_39 (BatchN (None, 14, 14, 1024 4096 ['conv2d_39[0][0]']
ormalization) )
add_11 (Add) (None, 14, 14, 1024 0 ['batch_normalization_39[0][0]',
) 'activation_33[0][0]']
activation_36 (Activation) (None, 14, 14, 1024 0 ['add_11[0][0]']
)
conv2d_40 (Conv2D) (None, 14, 14, 256) 262400 ['activation_36[0][0]']
batch_normalization_40 (BatchN (None, 14, 14, 256) 1024 ['conv2d_40[0][0]']
ormalization)
activation_37 (Activation) (None, 14, 14, 256) 0 ['batch_normalization_40[0][0]']
conv2d_41 (Conv2D) (None, 14, 14, 256) 590080 ['activation_37[0][0]']
batch_normalization_41 (BatchN (None, 14, 14, 256) 1024 ['conv2d_41[0][0]']
ormalization)
activation_38 (Activation) (None, 14, 14, 256) 0 ['batch_normalization_41[0][0]']
conv2d_42 (Conv2D) (None, 14, 14, 1024 263168 ['activation_38[0][0]']
)
batch_normalization_42 (BatchN (None, 14, 14, 1024 4096 ['conv2d_42[0][0]']
ormalization) )
add_12 (Add) (None, 14, 14, 1024 0 ['batch_normalization_42[0][0]',
) 'activation_36[0][0]']
activation_39 (Activation) (None, 14, 14, 1024 0 ['add_12[0][0]']
)
conv2d_43 (Conv2D) (None, 7, 7, 512) 524800 ['activation_39[0][0]']
batch_normalization_43 (BatchN (None, 7, 7, 512) 2048 ['conv2d_43[0][0]']
ormalization)
activation_40 (Activation) (None, 7, 7, 512) 0 ['batch_normalization_43[0][0]']
conv2d_44 (Conv2D) (None, 7, 7, 512) 2359808 ['activation_40[0][0]']
batch_normalization_44 (BatchN (None, 7, 7, 512) 2048 ['conv2d_44[0][0]']
ormalization)
activation_41 (Activation) (None, 7, 7, 512) 0 ['batch_normalization_44[0][0]']
conv2d_45 (Conv2D) (None, 7, 7, 2048) 1050624 ['activation_41[0][0]']
conv2d_46 (Conv2D) (None, 7, 7, 2048) 2099200 ['activation_39[0][0]']
batch_normalization_45 (BatchN (None, 7, 7, 2048) 8192 ['conv2d_45[0][0]']
ormalization)
batch_normalization_46 (BatchN (None, 7, 7, 2048) 8192 ['conv2d_46[0][0]']
ormalization)
add_13 (Add) (None, 7, 7, 2048) 0 ['batch_normalization_45[0][0]',
'batch_normalization_46[0][0]']
activation_42 (Activation) (None, 7, 7, 2048) 0 ['add_13[0][0]']
conv2d_47 (Conv2D) (None, 7, 7, 512) 1049088 ['activation_42[0][0]']
batch_normalization_47 (BatchN (None, 7, 7, 512) 2048 ['conv2d_47[0][0]']
ormalization)
activation_43 (Activation) (None, 7, 7, 512) 0 ['batch_normalization_47[0][0]']
conv2d_48 (Conv2D) (None, 7, 7, 512) 2359808 ['activation_43[0][0]']
batch_normalization_48 (BatchN (None, 7, 7, 512) 2048 ['conv2d_48[0][0]']
ormalization)
activation_44 (Activation) (None, 7, 7, 512) 0 ['batch_normalization_48[0][0]']
conv2d_49 (Conv2D) (None, 7, 7, 2048) 1050624 ['activation_44[0][0]']
batch_normalization_49 (BatchN (None, 7, 7, 2048) 8192 ['conv2d_49[0][0]']
ormalization)
add_14 (Add) (None, 7, 7, 2048) 0 ['batch_normalization_49[0][0]',
'activation_42[0][0]']
activation_45 (Activation) (None, 7, 7, 2048) 0 ['add_14[0][0]']
conv2d_50 (Conv2D) (None, 7, 7, 512) 1049088 ['activation_45[0][0]']
batch_normalization_50 (BatchN (None, 7, 7, 512) 2048 ['conv2d_50[0][0]']
ormalization)
activation_46 (Activation) (None, 7, 7, 512) 0 ['batch_normalization_50[0][0]']
conv2d_51 (Conv2D) (None, 7, 7, 512) 2359808 ['activation_46[0][0]']
batch_normalization_51 (BatchN (None, 7, 7, 512) 2048 ['conv2d_51[0][0]']
ormalization)
activation_47 (Activation) (None, 7, 7, 512) 0 ['batch_normalization_51[0][0]']
conv2d_52 (Conv2D) (None, 7, 7, 2048) 1050624 ['activation_47[0][0]']
batch_normalization_52 (BatchN (None, 7, 7, 2048) 8192 ['conv2d_52[0][0]']
ormalization)
add_15 (Add) (None, 7, 7, 2048) 0 ['batch_normalization_52[0][0]',
'activation_45[0][0]']
activation_48 (Activation) (None, 7, 7, 2048) 0 ['add_15[0][0]']
global_max_pooling2d (GlobalMa (None, 2048) 0 ['activation_48[0][0]']
xPooling2D)
dense (Dense) (None, 1000) 2049000 ['global_max_pooling2d[0][0]']
==================================================================================================
Total params: 25,636,712
Trainable params: 25,583,592
Non-trainable params: 53,120
Description
For image classification tasks, a deep neural network design called ResNet50, which contains 50 layers, is frequently utilized.
- The three functions identity_block(), conv_block(), and resnet50() are defined in the implementation.
- The identity block of the ResNet architecture is implemented by the identity_block() function. The output of the preceding layer and the total number of filters used in each convolutional layer are inputs.
- The input is first saved as a shortcut, followed by sequential 1x1, 3x3, and 1x1 convolutions on the input, batch normalization, and ReLU activation. Finally, it applies another ReLU activation, returns the output, and adds the shortcut to the output of the third convolution.
- The convolutional block of the ResNet architecture is implemented by the conv_block() function. The output of the preceding layer, the number of filters used in each convolutional layer, and the first convolutional layer's strides are all inputs.
- Following batch normalization and ReLU activation, the input is subjected to sequential 1x1, 3x3, and 1x1 convolutions. Additionally, it performs a 1x1 convolution on the shortcut and batch normalization on the third convolution's output. Finally, it applies another ReLU activation, returns the output, and adds the shortcut to the output of the third convolution.
- The resnet50() function implements the ResNet50 model using the previously defined identity_block() and conv_block() functions. It takes as input the shape of the input image and the number of output classes. It first applies a 7x7 convolution on the input image, followed by batch normalization and ReLU activation. It then applies max pooling with a 3x3 window and stride of 2.
- Next, it applies the conv_block() and identity_block() functions with different parameters, followed by another conv_block() and identity_block() sequence. It repeats this pattern with increasing filter sizes and decreasing spatial dimensions. Finally, it applies global max pooling, followed by a dense layer with softmax activation that produces the final output. The resulting model is returned.
Conclusion
ResNet is a potent and popular
deep learning architecture that has significantly aided advancement in a variety of computer vision tasks, including object identification and recognition. The inclusion of residual connections, which effectively skip layers that might not contribute to the final output, enables the network to learn more accurate and complicated features. ResNet is a deep neural network design that is practical for both researchers and practitioners because it is very simple to deploy. It has become the de facto architecture for computer vision since numerous benchmarks have demonstrated that its performance is superior to that of other architectures.