Overfitting & Underfitting
Introduction
Overfitting is a typical problem in machine learning in which a model learns the training data too well and begins to memorize it rather than generalizing to new, unseen data. In other words, the model fits the training data too closely, which might lead to poor performance when generating predictions on new data.
Occurrence
Underfitting
Underfitting is a common problem in Machine learning where a model undergoes to the very simple or too low performance and unable to capture the underlying patterns in the data and on the training and validation/ test data is resulting in a poor performance. In simple words, the model is said to be in underfitting if the model is unable to capture the complexity of the data which leads to the high bias.
Occurrence:
Underfitting can occur when the model has too few parameters or is not trained for a sufficient amount of time. A linear regression model, for example, may underfit a dataset having a nonlinear relationship between the input and output variables.
To overcome underfitting, make the model more complicated by adding more layers or neurons, or by employing a more powerful model architecture. Furthermore, the model can be trained for a longer period of time, or the learning rate can be increased to allow the model to learn faster.
Ultimately, finding the proper balance of model complexity and training is critical to avoiding underfitting and creating a model that performs well on both training and validation/test data.
Techniques used to overcome the Overfitting and Underfitting problems
1. Regularization strategies include a penalty term in the loss function to prevent the model from learning overly complicated or big weights. Regularization is classified into two types:
a. L1 regularization: Adds a penalty term proportionate to the weights' absolute value. This encourages the model to have sparse weights, which can aid in feature selection.
b. L2 regularization: Adds a penalty term proportionate to the weights squared. This encourages the model to weigh less and may help prevent overfitting.
2. Dropout: Dropout is a regularization strategy that randomly removes some neurons during training to avoid the model from becoming overly reliant on a specific set of features.
3. Early stopping entails monitoring the validation loss during training and terminating the training process when the validity loss begins to rise. This reduces overfitting by preventing the model from training for too long and memorizing the training data.
4. Data augmentation: Techniques like rotation, translation, and flipping can be employed to enhance the amount of the training dataset, which can assist minimize overfitting by giving more diverse examples for the model to learn from.
5. Model architecture: The model's architecture can also be tweaked to avoid overfitting or underfitting. Here are a few examples:
a. Increasing the number of layers or neurons increases the model's capacity, whereas decreasing the number of layers or neurons decreases its capacity.
b. Increasing the randomness of the training process by using a smaller batch size during training can assist minimize overfitting.
c. Employing a different activation function or optimizer can also have an impact on the model's generalizability.
Ultimately, finding the correct balance between model complexity and regularization is critical to avoiding overfitting or underfitting and creating a model that works well on fresh, unseen data. Depending on the problem and dataset, other strategies may be more effective.
Optimal
Optimal fitting which refers that the best balance of both problems like underfitting and overfitting in a machine learning. It depicts that the model is not so complex but not too simple that it learns the noise in the training data and it fails to generate the new data or unseen data.
The best possible performance of the model on new data leads to optimal fitting. Getting the optimal balance of the model complexity and regularization is challenging and also often requires the iterative processing of the network and its validation.
Optimal fitting is an important goal in machine learning which is essential for building models which are accurate, robust, and generalizable to new data and unseen data.
Variance and Bias problem
Variance and bias are two fundamental machine learning concepts that are connected to model performance.
The disparity between a model's anticipated values and the true values of the target variable is referred to as bias. It calculates the average difference between the model's predictions and the actual data. A model with a significant bias is considered to underfit the data because it is overly simplistic and incapable of capturing the underlying patterns in the data.
Variability is the variance of model predictions across multiple training sets. It assesses how much the model's predictions differ from one another when trained on different types of data. When a model learns the noise in the training data rather than the relevant outputs, this is referred to as overfitting.
To be more specific, bias is defined by how well the model fits the training data, and variance is determined by how well the model generalizes to new, previously unseen data. The goal is to find a model that balances bias and variance, which is known as the bias-variance trade-off.
Implementation
Let's implement the sample code and visualize the output.
Source Code
# Import the Required Libraries
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.model_selection import train_test_split
# Generate synthetic data
np.random.seed(42)
X = np.linspace(-1, 1, 100)
y = 2 * X + np.random.normal(0, 0.2, size=100)
# Split data into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)
# Define the model architecture
model = Sequential()
model.add(Dense(10, activation='relu', input_shape=(1,)))
model.add(Dense(1))
# Compile the model
model.compile(optimizer='adam', loss='mse')
# Train the model
history = model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=100, verbose=0)
# Plot the training and validation loss
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
- We import the relevant libraries: TensorFlow for deep learning, Matplotlib for graphing, NumPy for numerical computations, and the necessary TensorFlow and scikit-learn modules.
- NumPy is used to produce synthetic data. To assure reproducibility, a random seed was set. The 100 points between -1 and 1 that make up the X values are distributed evenly. The y values are determined by formulating them as a linear function of X with some additional Gaussian noise.
- Using the scikit-learn train_test_split function, the data is divided into training and validation sets. 80% of the data, in this case, is used for training, and 20% is utilized for validation.
- We use the Sequential model from Keras, a high-level TensorFlow API, to define the neural network's architecture. A ReLU activation function and a single hidden layer with 10 units make up the model. We have one feature, as indicated by the input shape, which is specified as (1,).
- The Adam optimizer and mean squared error (MSE) loss are used to build the model. A well-liked optimization algorithm called the Adam optimizer adjusts the learning rate for each weight separately.
- The fit function is employed in the model's training. As arguments, we offer the training data, validation data, and the number of epochs (100). To make the output succinct during training, we disable the progress bar by setting verbose=0.
- The training and validation loss values for each epoch are contained in the history object that the fit function returns after training.
- Using Matplotlib, we plot the training loss and validation loss over the epochs. The values for the training loss are contained in history. history['loss'], and the values for the validation loss are contained in history. history['val_loss']. These two curves can be compared to show how the model behaved throughout training.
- A plot of the training loss and validation loss as a function of the number of epochs is the code's output. Overfitting is present if the training loss declines while the validation loss is significant or begins to rise. On the other hand, underfitting is suggested if both losses are high and do not significantly diminish. The objective is to strike a compromise between minimizing the validation loss and not overfitting the training data.
Key points to remember
- The bias of the model represents how well it fits the training set.
- The variance of the model represents how well it fits unseen cases in the validation set.
- Underfitting is characterized by a high bias and a low/high variance.
- Overfitting is characterized by a large variance and a low bias.
- A neural network with underfitting cannot reliably predict the training set, let alone the validation set. This is distinguished by a high bias and a high variance.
- Solutions for Underfitting:
- Increasing the number of neuron layers or inputs.
- Increasing the number of training samples or enhancing quality.
- lowering the regularization parameter.
- A neural network which is having an overfitting problem which is good at learning its training set but fails miserably at generalizing its predictions to an independent testing set. This is distinguished by low bias and large variance.
- Initialization of neural networks (retraining)
- Multiple neural networks
- Early Stopping
- Regularization
- Dropout