ResNet
Introduction
The deep neural network architecture known as Residual Network, or simply ResNet, was first presented in 2015 by Kaiming He et al. ResNet is renowned for being able to train networks that are considerably deeper than was previously conceivable by circumventing the issue of vanishing gradients that arises when training very deep neural networks.
Utilizing residual connections, which enable the network to skip some layers and send information straight from one to the next, is the main principle of ResNet. In essence, the network must learn only the differences between the input and output of a given set of layers rather than the entire mapping from input to output. As a result, the gradients don't get too small as they travel back through the layers, making it simpler to train very deep networks.
In computer vision applications, ResNet is frequently utilized, especially for picture categorization and object detection tasks. On several benchmark datasets, it has been demonstrated to perform better than other cutting-edge architectures. ResNet also comes in a variety of forms with different depths and numbers of layers, such as ResNet-50, ResNet-101, and ResNet-152.
History
Architecture
Residual Block
- F(x) is the desired underlying mapping that we are trying to learn in order to serve as the input for the activation function above.
- The box with the dotted line must learn the residual mapping f(x) - x.
- A residual connection is the solid line that carries the layer input x to the addition operator.
- Two 3x3 convolutional layers with the same number of output channels make up the n Block. A batch normalizing layer and a ReLU activation function are then applied on top of it.
ResNet-50, one of the most well-known ResNet variants, contains 50 layers and was utilized in 2015's ImageNet Large Scale Visual Recognition Challenge (ILSVRC) to attain cutting-edge performance. Two further typical versions of the architecture are ResNet-101 and ResNet-152, which are more intricate iterations.
Working
Applications
- Image Classification: ResNet performed at the cutting edge on the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), among other benchmark datasets for image classification.
- Object detection models like Faster R-CNN, Mask R-CNN, and YOLOv3 have all employed ResNet as their underlying architecture. On the COCO object detection dataset, these models performed at the cutting edge.
- ResNet has been used for semantic segmentation tasks, in which a class label is intended to be assigned to each pixel in an image. Modern performance has been attained by ResNet-based models on datasets like Cityscapes and PASCAL VOC.
- ResNet has been employed as a pre-trained feature extractor for transfer learning, in which a model is initially trained on a huge dataset and then fine-tuned on a smaller dataset for a particular task. On a variety of computer vision tasks, state-of-the-art performance has been achieved using this method.
Implementation
import tensorflow as tf
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, GlobalMaxPooling2D, Dense, Add
from tensorflow.keras.models import Model
# Define the blocks and the filters using the CNN layers
def identity_block(x, filters):
f1, f2, f3 = filters
x_shortcut = x
x = Conv2D(filters=f1, kernel_size=(1, 1), strides=(1, 1), padding='valid')(x)
x = tf.keras.layers.BatchNormalization(axis=3)(x)
x = tf.keras.layers.Activation('relu')(x)
x = Conv2D(filters=f2, kernel_size=(3, 3), strides=(1, 1), padding='same')(x)
x = tf.keras.layers.BatchNormalization(axis=3)(x)
x = tf.keras.layers.Activation('relu')(x)
x = Conv2D(filters=f3, kernel_size=(1, 1), strides=(1, 1), padding='valid')(x)
x = tf.keras.layers.BatchNormalization(axis=3)(x)
x = Add()([x, x_shortcut])
x = tf.keras.layers.Activation('relu')(x)
return x
def conv_block(x, filters, strides):
f1, f2, f3 = filters
x_shortcut = x
x = Conv2D(filters=f1, kernel_size=(1, 1), strides=strides, padding='valid')(x)
x = tf.keras.layers.BatchNormalization(axis=3)(x)
x = tf.keras.layers.Activation('relu')(x)
x = Conv2D(filters=f2, kernel_size=(3, 3), strides=(1, 1), padding='same')(x)
x = tf.keras.layers.BatchNormalization(axis=3)(x)
x = tf.keras.layers.Activation('relu')(x)
x = Conv2D(filters=f3, kernel_size=(1, 1), strides=(1, 1), padding='valid')(x)
x = tf.keras.layers.BatchNormalization(axis=3)(x)
x_shortcut = Conv2D(filters=f3, kernel_size=(1, 1), strides=strides, padding='valid')(x_shortcut)
x_shortcut = tf.keras.layers.BatchNormalization(axis=3)(x_shortcut)
x = Add()([x, x_shortcut])
x = tf.keras.layers.Activation('relu')(x)
return x
# Define the ResNet 50 CNN block and its layers
def resnet50(input_shape=(224, 224, 3), classes=1000):
x_input = Input(input_shape)
x = Conv2D(filters=64, kernel_size=(7, 7), strides=(2, 2), padding='valid')(x_input)
x = tf.keras.layers.BatchNormalization(axis=3)(x)
x = tf.keras.layers.Activation('relu')(x)
x = MaxPooling2D(pool_size=(3, 3), strides=(2, 2))(x)
x = conv_block(x, filters=[64, 64, 256], strides=(1, 1))
x = identity_block(x, filters=[64, 64, 256])
x = identity_block(x, filters=[64, 64, 256])
x = conv_block(x, filters=[128, 128, 512], strides=(2, 2))
x = identity_block(x, filters=[128, 128, 512])
x = identity_block(x, filters=[128, 128, 512])
x = identity_block(x, filters=[128, 128, 512])
x = conv_block(x, filters=[256, 256, 1024], strides=(2, 2))
x = identity_block(x, filters=[256, 256, 1024])
x = identity_block(x, filters=[256, 256, 1024])
x = identity_block(x, filters=[256, 256, 1024])
x = identity_block(x, filters=[256, 256, 1024])
x = identity_block(x, filters=[256, 256, 1024])
x = conv_block(x, filters=[512, 512, 2048], strides=(2, 2))
x = identity_block(x, filters=[512, 512, 2048])
x = identity_block(x, filters=[512, 512, 2048])
x = GlobalMaxPooling2D()(x)
x = Dense(classes, activation='softmax')(x)
model = Model(inputs=x_input, outputs=x, name='resnet50')
return model
# Summary of the Model
model = resnet50()
model.summary()
Obtained Output:
Description
- The three functions identity_block(), conv_block(), and resnet50() are defined in the implementation.
- The identity block of the ResNet architecture is implemented by the identity_block() function. The output of the preceding layer and the total number of filters used in each convolutional layer are inputs.
- The input is first saved as a shortcut, followed by sequential 1x1, 3x3, and 1x1 convolutions on the input, batch normalization, and ReLU activation. Finally, it applies another ReLU activation, returns the output, and adds the shortcut to the output of the third convolution.
- The convolutional block of the ResNet architecture is implemented by the conv_block() function. The output of the preceding layer, the number of filters used in each convolutional layer, and the first convolutional layer's strides are all inputs.
- Following batch normalization and ReLU activation, the input is subjected to sequential 1x1, 3x3, and 1x1 convolutions. Additionally, it performs a 1x1 convolution on the shortcut and batch normalization on the third convolution's output. Finally, it applies another ReLU activation, returns the output, and adds the shortcut to the output of the third convolution.
- The resnet50() function implements the ResNet50 model using the previously defined identity_block() and conv_block() functions. It takes as input the shape of the input image and the number of output classes. It first applies a 7x7 convolution on the input image, followed by batch normalization and ReLU activation. It then applies max pooling with a 3x3 window and stride of 2.
- Next, it applies the conv_block() and identity_block() functions with different parameters, followed by another conv_block() and identity_block() sequence. It repeats this pattern with increasing filter sizes and decreasing spatial dimensions. Finally, it applies global max pooling, followed by a dense layer with softmax activation that produces the final output. The resulting model is returned.