Hyperparameter Tuning
Introduction
Finding the ideal set of hyperparameters for a deep learning model that produces the best results on a given task is known as hyperparameter tuning.
The model's architecture is determined by hyperparameters, which include the number of layers, the number of neurons in each layer, the learning rate, the batch size, and the regularization strength. Hyperparameters are specified before the training process starts.
Selecting the hyperparameter values that improve the model's performance on a particular validation set is the process of hyperparameter tweaking. The requirement to try and assess numerous hyperparameter combinations makes this a time-consuming operation.
Techniques
For hyperparameter tweaking, a variety of strategies or procedures are employed in deep learning, some of which include:
- Grid search: This method involves defining a set of hyperparameters and then attempting all possible combinations of those parameters to discover the optimal set for our model.
- Random Search: Grid search is similar to random search, however with random search we establish a range of values for each hyperparameter and then randomly choose values to construct a combination of hyperparameters.
- Bayesian optimization: It is a method for determining the ideal set of hyperparameters by using probability. It use a probabilistic model to forecast how each set of hyperparameters will perform before selecting the hyperparameter which will give the best performance in the neural network.
- Gradient-based optimization: This method optimizes the hyperparameters by utilizing the gradient of the loss function with respect to the hyperparameters.
- Evolutionary algorithms: Inspired by natural selection, evolutionary algorithms use ideas like mutation, selection, and crossover to determine the ideal set of hyperparameters.
Grid, random, and Bayesian optimization searches are frequently used methods for hyperparameter tuning. With these methods, the hyperparameter space is methodically explored, and the model's performance is assessed using various combinations of the hyperparameters.
Why deep learning should employ hyperparameter tuning
Deep learning hyperparameter adjustment is crucial since it can considerably raise a model's performance on a given job. You may increase generalization, decrease overfitting, and improve model accuracy by determining the ideal set of hyperparameters. Your model might not perform as well as it could and you might lose out on chances to boost its performance if you don't tune the hyperparameters.
When to employ deep learning's hyperparameter tuning
When using deep learning, hyperparameter tuning is utilized to boost a model's performance on a particular task. It is usually carried out after you have created a fundamental model architecture and need to fine-tune the hyperparameters to enhance the model's functionality.
How to apply hyperparameter tuning in deep learning?
Grid search, random search, and Bayesian optimization are a few methods for hyperparameter tuning in deep learning. These methods entail methodically perusing the hyperparameter space and assessing the model's performance with various hyperparameter combinations. To employ hyperparameter tuning, you must first identify the hyperparameters to optimize, choose the best tuning method, and establish a validation mechanism to assess the performance of the combination of the hyperparameter.
Implementation
Platform: Colab notebook
Dataset: MNIST
Source code
# Import the necessary Libraries
import numpy as np
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical
from sklearn.model_selection import GridSearchCV
import keras_tuner as kt
- Load and preprocessing the dataset with the help of Keras Tuner, this code imports the libraries needed for creating and training a CNN (Convolutional Neural Network) using the MNIST dataset.This program loads the MNIST dataset and reformats it for usage with a CNN. The input photos, which are grayscale 28x28 images, are transformed to have dimensions (28, 28, 1) as a result. The to_categorical function one-hot encodes the output labels.
# Load the MNIST dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# Reshape the data for use in a CNN
X_train = X_train.reshape(-1, 28, 28, 1)
X_test = X_test.reshape(-1, 28, 28, 1)
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
- Define a function to CNN Model Hyperparameters by utilizing Keras Tuner, this function builds a CNN model with hyperparameters. The convolutional layer hyperparameters that are tuned are the number of filters (conv1_filter and conv2_filter), the size of the convolutional layer kernels (conv1_kernel and conv2_kernel), the number of units (dense1_units) in the dense layer, and the learning rate of Adam Optimizer(learning_rate).
# Define a function to create the CNN model
def create_model(hp):
model = Sequential()
model.add(Conv2D(filters=hp.Int('conv1_filter', min_value=32, max_value=128, step=16),
kernel_size=hp.Choice('conv1_kernel', values=[3, 5]),
activation='relu',
input_shape=(28,28,1)))
model.add(Conv2D(filters=hp.Int('conv2_filter', min_value=32, max_value=128, step=16),
kernel_size=hp.Choice('conv2_kernel', values=[3, 5]),
activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(units=hp.Int('dense1_units', min_value=32, max_value=512, step=32),
activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer=Adam(hp.Choice('learning_rate', values=[0.01, 0.001])),
loss='categorical_crossentropy',
metrics=['accuracy'])
return model
- The hyperparameters specified in create_model are tuned using a random search by the Keras Tuner object (tuner) provided in this code. The maximum number of trials is set to 2 (max_trials=2), and the purpose is to optimize validation accuracy. The project name and the directory
# Define the tuner object
tuner = kt.RandomSearch(
create_model,
objective='val_accuracy',
max_trials=2,
directory='test_dir',
project_name='mnist_classification'
)
- In this improved code, the hyperparameter tuning is carried out via keras_tuner rather than GridSearchCV. We build a function called create_model that takes a hp parameter, which stands for the hyperparameters to be tuned. The function specifies the hyperparameters when building the CNN model and compiles the model with those parameters as well.
- To find the ideal hyperparameters, we utilize RandomSearch from keras_tuner. The val_accuracy optimization goal is defined, and the max_trials parameter is set to limit the number of models to be tested. The directory and project_name are also specified so that the search results can be saved.
- The best model and hyperparameters discovered by the tuner object are then printed.
# Fit the tuner object to the data
tuner.search(X_train, y_train, epochs=2, validation_data=(X_test, y_test))
# Print the best model and hyperparameters
best_model = tuner.get_best_models(num_models=1)[0]
best_hyperparameters = tuner.get_best_hyperparameters(num_trials=1)[0]
print(f"Best Model: {best_model.summary()}")
print(f"Best Hyperparameters: {best_hyperparameters}")
Obtained Output:
Trial 2 Complete [00h 15m 23s]
val_accuracy: 0.9824000000953674
Best val_accuracy So Far: 0.9824000000953674
Total elapsed time: 00h 23m 56s
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 26, 26, 112) 1120
conv2d_1 (Conv2D) (None, 24, 24, 64) 64576
max_pooling2d (MaxPooling2D (None, 12, 12, 64) 0
)
flatten (Flatten) (None, 9216) 0
dense (Dense) (None, 32) 294944
dense_1 (Dense) (None, 10) 330
=================================================================
Total params: 360,970
Trainable params: 360,970
Non-trainable params: 0
Note: While implementing the above code in any Python platform it will take in and around half an hour to print the output as it undergoes the model training and retrieves the best accuracy. Be patience!
Key Points to Remember
- Iterative hyperparameter tweaking is a procedure. Finding the ideal set of hyperparameters could need numerous iterations of trial and evaluation.
- There are various approaches for hyperparameter tuning, including genetic algorithms and swarm optimization, in addition to the strategies we covered (such as grid search, random search, and Bayesian optimization).
- The architecture of the model being utilized may have an impact on the hyperparameter selection. A recurrent neural network (RNN) may have different optimal hyperparameters than a convolution neural network (CNN), for instance.
- Hyperparameter tuning is just one aspect of deep learning model development. Other important steps include data preprocessing, model architecture design, regularization, and evaluation.
- Hyperparameter tuning is not a silver bullet that guarantees improved performance. It's possible to spend a lot of time tuning hyperparameters and end up with a model that doesn't perform much better than a baseline model with default hyperparameters.
Conclusion
In conclusion, optimizing the hyperparameters is an essential step in creating powerful deep-learning models. The performance of our models can be optimized, and we can get better outcomes by choosing the optimum hyperparameters. In this example, we trained a convolution neural network model on the MNIST dataset and used the Keras Tuner module to do a random search over a number of hyperparameters. The optimal collection of hyperparameters and the related model with the highest validation accuracy were returned by the search. We may fine-tune our models and enhance their performance on a variety of tasks by employing these hyperparameter tweaking methods.
References
[1] Krizhevsky et al., Proc. Advances in Neural Information Processing Systems 2012.
[2] Le Cun al., Nature 2015