Difference Between PCA and Autoencoders with an example

PCA(Principal Component Analysis)

Introduction

The statistical method known as PCA, or principal component analysis, is used to reduce the number of dimensions and extract features. Machine learning, data analysis, and pattern recognition are just a few of the disciplines where it is commonly utilized.

In order to reduce the amount of irrelevant information lost during the transformation of a high-dimensional dataset into a lower-dimensional space, PCA's major objective. In order to do this, it finds the main components—a set of orthogonal axes along which the data varies the most—and uses these to describe the data. The highest amount of variance in the data is captured by the first principal component, and as each succeeding component is orthogonal to the preceding ones, it also captures the maximum amount of remaining variance.

The following actions are part of the PCA process:

Standardization: If the dataset's characteristics are assessed using several scales, the data must be transformed to ensure that each feature has a zero mean and unit variance. This stage makes sure that no feature, despite its size, dominates the PCA process.

Calculation of the covariance matrix: The covariance matrix is calculated using the standardized data. The covariance matrix offers details on the correlations and variances between the various dataset features.

The covariance matrix is then divided into its eigenvalues and related eigenvectors by the process of eigen decomposition. Each primary component is represented by an eigenvector, and the matching eigenvalue shows how much variation is explained by that component.

The eigenvectors are ordered according to their corresponding eigenvalues, with the eigenvector associated with the highest eigenvalue indicating the most significant principal component. We can decide how many primary components to keep by choosing a subset of the top-ranked eigenvectors.

Projection: The original data are projected onto a new, lower-dimensional space using the principal components that have been chosen. The original features are combined linearly and weighted by their associated eigenvectors in this transformation.

Dimensionality reduction, high-dimensional data visualization, noise reduction, and feature extraction are just a few advantages of PCA. It can assist in locating the most significant characteristics or patterns in the data and delete extraneous or pointless material. The underlying data must have a linear structure because PCA is a linear approach.

Overall, PCA is a useful method for preprocessing and reducing large datasets, enabling more effective computation, and enhancing interpretability.

Autoencoders

Neural network topologies called "autoencoders" are employed for unsupervised learning, particularly in the areas of dimensionality reduction and representation learning. By training the network to rebuild the original input from a lower-dimensional latent space, they hope to learn a compressed representation of the input data.

Both an encoder and a decoder are essential parts of an autoencoder. Using the input data as a starting point, the encoder converts it to a representation in a lower-dimensional latent space. Following that, the decoder reconstructs the original input data using this compressed representation. Typically, both the encoder and the decoder have symmetrical structures.

The autoencoder strives to reduce the difference between the input and the reconstructed output during training. Typically, a loss function, like mean squared error (MSE), is used for this, which calculates the reconstruction error. The autoencoder learns to record the most important aspects of the input data in the latent space by minimizing this mistake.

The fundamental tenet of autoencoders is that the intermediate bottleneck layer, which includes the compressed representation (latent space), functions as a bottleneck, forcing the network to take in just the most crucial data while eluding irrelevant or noise-containing elements. Autoencoders can learn a concise representation of the data in this way.

Autoencoder structure

There are several uses for autoencoders, such as:

Autoencoders can be used to reduce the dimensionality of high-dimensional data, making computation and visualization more effective.

Data denoising: Autoencoders are capable of denoising the data by learning to recover clean data from noisy input.

Anomaly Detection: Autoencoders is helpful for spotting anomalies or outliers because they can pick up on the patterns and structure of typical data.

Feature Extraction: For tasks like classification or clustering that come after, the compressed representation that autoencoders learn can be a helpful feature representation.

Generative modeling is the process of creating fresh data samples that are similar to the training data by training an autoencoder on a dataset.

Autoencoders provide more flexibility and can recognize nonlinear correlations in the data compared to other dimensionality reduction methods like PCA. However, in order to train efficiently, autoencoders often need more training data and computer resources. Additionally, if the training data is insufficient or the network capacity is too high, they are vulnerable to overfitting.

Relationship Between PCA and Autoencoders

The dimensionality reduction and feature extraction objectives of PCA and autoencoders are identical, but their underlying strategies and methods of execution are different.

PCA and autoencoders both seek to minimize the number of dimensions in the data and isolate the most crucial information. The PCA statistical method, on the other hand, aims to identify a group of orthogonal axes (principal components) that account for the most variance in the data. On the other hand, autoencoders are neural network designs that train the network to reconstruct the original input from a lower-dimensional latent space. As a result, the network learns a compressed representation of the data.

Contrasting linearity and non-linearity, PCA is a linear method that presumes the underlying data has a linear structure. The original features are captured in their linear combinations. In contrast, because autoencoders are built on neural networks, they can recognize non-linear correlations in the data. Autoencoders can learn more complicated patterns and dependencies thanks to the non-linear changes that the hidden layers of autoencoders introduce.

The principal components are computed directly from the data matrix using eigenvalue decomposition in PCA, an unsupervised learning technique. Both explicit labels and a training procedure are not necessary. However, autoencoders need to be trained on a particular dataset. By modifying the weights and biases of the neural network, they use an iterative optimization procedure to reduce the reconstruction error between the input and the output.

Data reconstruction: While autoencoders learn non-linear mappings for both encoding and decoding, PCA discovers a linear transformation of the original data to the lower-dimensional space. By re-projecting the original data onto the principal components, PCA can recreate the original data. In contrast, autoencoders use the encoder-decoder network to teach them how to rebuild the input.

Flexibility: When compared to PCA, autoencoders are more flexible. You can change the network architecture and the number of hidden layers to capture intricate, non-linear correlations in the data. Since PCA is a linear method, it may be difficult to identify complex patterns in the data that don't follow linear structures.

Despite their differences, PCA and autoencoders share some similarities. A linear autoencoder with a single hidden layer and linear activation functions has been demonstrated to be capable of learning the same subspace as PCA under specific circumstances. The main components derived from PCA can also be utilized as a starting point or source of inspiration when training an autoencoder.

PCA Vs Autoencoders

PCA

Autoencoders

An unsupervised learning statistical method is PCA.

Neural network designs called autoencoders are employed in unsupervised learning.

Dimensionality reduction and feature extraction are its main applications.

They are mostly employed for feature extraction, dimensionality reduction, and representation learning.

PCA identifies a collection of orthogonal axes (principal components) that best capture the data's overall variance.

In order to train the network to reconstruct the original input from a lower-dimensional latent space, autoencoders must first learn a compressed representation of the input data.

It uses eigenvalue decomposition and linear algebra to compute the major components directly from the data.

They need to be trained on a particular dataset and to minimize the reconstruction error, they employ an iterative optimization technique, usually backpropagation.

A training procedure or explicit labeling are not necessary for PCA.

Since autoencoders are built on neural networks, they can detect non-linear correlations in the data.

Data compression, noise reduction, and data visualization are three common uses for it.

They are employed in processes including feature extraction, data denoising, anomaly detection, and generative modeling.

A linear approach called PCA makes the assumption that the data has a linear structure.

Autoencoders can handle complicated non-linear patterns in the data and allow more flexibility in network architecture.

There is no explicit encoding or decoding involved in PCA.

A decoding step is used in autoencoders to reconstruct the input from the latent space after an explicit encoding step has mapped the input data to a compressed representation.

Implementation

Here goes the source code using an image and will generate the code individually for PCA and Autoencoders as follows below:

Source code for PCA

# Import the necessary libraries

import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from skimage.io import imread

# Load the image
image = imread('/content/istockphoto-1277541723-612x612.jpg')  # Replace 'path_to_image' with the actual path to your image

# Flatten the image to a 1D array
image_flat = image.reshape(-1, 3)

# Perform PCA
n_components = min(image_flat.shape[0], image_flat.shape[1]) - 1  # Set maximum components based on image dimensions
pca = PCA(n_components=n_components)
image_pca = pca.fit_transform(image_flat)

# Reconstruct the image from the PCA result
image_reconstructed = pca.inverse_transform(image_pca)
image_reconstructed = np.clip(image_reconstructed, 0, 1)  # Clip values to 0-1 range
image_reconstructed = image_reconstructed.reshape(image.shape)

# Visualize the original image and its reconstruction
plt.figure(figsize=(10, 5))

plt.subplot(1, 2, 1)
plt.imshow(image)
plt.title('Original Image')

plt.subplot(1, 2, 2)
plt.imshow(image_reconstructed)
plt.title(f'PCA Reconstruction with {n_components} Components')

plt.tight_layout()
plt.show()

Obtained Output:

Source code for Autoencoder

# Import the required libraries

import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.layers import Input, Dense, Conv2D, MaxPooling2D, UpSampling2D
from tensorflow.keras.models import Model
from tensorflow.keras.preprocessing.image import load_img, img_to_array
from tensorflow.keras.callbacks import EarlyStopping

# Load the image
img = load_img('/content/istockphoto-1277541723-612x612.jpg', target_size=(224, 224))
image_array = img_to_array(img)

# Normalize the image data
image_array = image_array / 255.0

# Reshape the image to (num_samples, height, width, channels)
image_array = np.expand_dims(image_array, axis=0)

# Define the input shape
input_shape = image_array.shape[1:]

# Define the dimensions of the encoder and decoder
encoding_dim = 32  # Number of neurons in the hidden layer

# Define the autoencoder model
input_img = Input(shape=input_shape)

# Encoder
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(encoding_dim, (3, 3), activation='relu', padding='same')(x)

# Decoder
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(3, (3, 3), activation='sigmoid', padding='same')(x)

autoencoder = Model(input_img, decoded)

# Compile the model
autoencoder.compile(optimizer='adam', loss='mse')

# Train the autoencoder
history = autoencoder.fit(
    image_array,
    image_array,
    epochs=50,
    batch_size=1,
    callbacks=[EarlyStopping(monitor='loss', patience=5)]
)

# Generate the reconstructed image from the autoencoder
reconstructed_image = autoencoder.predict(image_array)[0]

# Plot the original and reconstructed images side by side
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
ax[0].imshow(image_array[0])
ax[0].set_title('Original')
ax[1].imshow(reconstructed_image)
ax[1].set_title('Reconstructed Autoencoder')
plt.show()

Obtained Output:

Epoch 1/50
1/1 [==============================] - 1s 1s/step - loss: 0.1012
Epoch 2/50
1/1 [==============================] - 0s 114ms/step - loss: 0.0988
Epoch 3/50
1/1 [==============================] - 0s 116ms/step - loss: 0.0960
Epoch 4/50
1/1 [==============================] - 0s 110ms/step - loss: 0.0928
Epoch 5/50
1/1 [==============================] - 0s 128ms/step - loss: 0.0894
Epoch 6/50
1/1 [==============================] - 0s 110ms/step - loss: 0.0856
Epoch 7/50
1/1 [==============================] - 0s 109ms/step - loss: 0.0816
Epoch 8/50
1/1 [==============================] - 0s 109ms/step - loss: 0.0776
Epoch 9/50
1/1 [==============================] - 0s 112ms/step - loss: 0.0737
Epoch 10/50
1/1 [==============================] - 0s 110ms/step - loss: 0.0704
Epoch 11/50
1/1 [==============================] - 0s 111ms/step - loss: 0.0679
Epoch 12/50
1/1 [==============================] - 0s 121ms/step - loss: 0.0662
Epoch 13/50
1/1 [==============================] - 0s 114ms/step - loss: 0.0655
Epoch 14/50
1/1 [==============================] - 0s 131ms/step - loss: 0.0656
Epoch 15/50
1/1 [==============================] - 0s 115ms/step - loss: 0.0659
Epoch 16/50
1/1 [==============================] - 0s 111ms/step - loss: 0.0657
Epoch 17/50
1/1 [==============================] - 0s 114ms/step - loss: 0.0647
Epoch 18/50
1/1 [==============================] - 0s 110ms/step - loss: 0.0632
Epoch 19/50
1/1 [==============================] - 0s 113ms/step - loss: 0.0613
Epoch 20/50
1/1 [==============================] - 0s 120ms/step - loss: 0.0591
Epoch 21/50
1/1 [==============================] - 0s 114ms/step - loss: 0.0565
Epoch 22/50
1/1 [==============================] - 0s 119ms/step - loss: 0.0540
Epoch 23/50
1/1 [==============================] - 0s 113ms/step - loss: 0.0516
Epoch 24/50
1/1 [==============================] - 0s 107ms/step - loss: 0.0492
Epoch 25/50
1/1 [==============================] - 0s 109ms/step - loss: 0.0470
Epoch 26/50
1/1 [==============================] - 0s 111ms/step - loss: 0.0450
Epoch 27/50
1/1 [==============================] - 0s 111ms/step - loss: 0.0433
Epoch 28/50
1/1 [==============================] - 0s 116ms/step - loss: 0.0418
Epoch 29/50
1/1 [==============================] - 0s 113ms/step - loss: 0.0402
Epoch 30/50
1/1 [==============================] - 0s 113ms/step - loss: 0.0386
Epoch 31/50
1/1 [==============================] - 0s 126ms/step - loss: 0.0373
Epoch 32/50
1/1 [==============================] - 0s 113ms/step - loss: 0.0364
Epoch 33/50
1/1 [==============================] - 0s 111ms/step - loss: 0.0358
Epoch 34/50
1/1 [==============================] - 0s 119ms/step - loss: 0.0353
Epoch 35/50
1/1 [==============================] - 0s 114ms/step - loss: 0.0347
Epoch 36/50
1/1 [==============================] - 0s 110ms/step - loss: 0.0341
Epoch 37/50
1/1 [==============================] - 0s 114ms/step - loss: 0.0334
Epoch 38/50
1/1 [==============================] - 0s 117ms/step - loss: 0.0325
Epoch 39/50
1/1 [==============================] - 0s 114ms/step - loss: 0.0318
Epoch 40/50
1/1 [==============================] - 0s 121ms/step - loss: 0.0314
Epoch 41/50
1/1 [==============================] - 0s 134ms/step - loss: 0.0309
Epoch 42/50
1/1 [==============================] - 0s 199ms/step - loss: 0.0304
Epoch 43/50
1/1 [==============================] - 0s 201ms/step - loss: 0.0299
Epoch 44/50
1/1 [==============================] - 0s 173ms/step - loss: 0.0292
Epoch 45/50
1/1 [==============================] - 0s 202ms/step - loss: 0.0286
Epoch 46/50
1/1 [==============================] - 0s 204ms/step - loss: 0.0280
Epoch 47/50
1/1 [==============================] - 0s 205ms/step - loss: 0.0275
Epoch 48/50
1/1 [==============================] - 0s 214ms/step - loss: 0.0269
Epoch 49/50
1/1 [==============================] - 0s 165ms/step - loss: 0.0263
Epoch 50/50
1/1 [==============================] - 0s 205ms/step - loss: 0.0254
1/1 [==============================] - 0s 142ms/step

PCA Output Analysis

Following the application of PCA to an input dataset, the resultant output often includes the following elements:

Primary components of Eigenvectors: The paths in the initial feature space where the data fluctuates the most are those. An eigenvector for each principal component is represented. The components are arranged from most important to least important, starting with the first component explaining the most variance in the data.

Eigenvalues: These show how much variance is accounted for by each primary component. The eigenvalues and eigenvectors are linked, and they represent the significance or relevance of each component. More prominent components are indicated by larger eigenvalues.

PCA can be visualized by using a scree plot, which shows the eigenvalues or the cumulative explained variance ratio. This plot aids in choosing how many principal components to keep.

Autoencoder Output Analysis

The rebuilt version of the input data is what an autoencoder produces as its output. A compressed version of the input data is learned by the autoencoder, which is then taught to recover the data in its original form. The trained autoencoder model is used to generate the output after processing the input.

To assess the differences in terms of visualization, the original and reconstructed images can be shown side by side. A good autoencoder will create reconstructed images that closely resemble the original ones because its objective is to reduce reconstruction error.

You can evaluate the autoencoder's effectiveness in collecting the crucial aspects of the input data and accurately reconstructing it by contrasting the original and rebuilt images.

Although the reduction of input data's dimensionality is a goal of both PCA and autoencoders, their methods and goals are different. While autoencoders are trained to learn a compressed representation and attempt to recover the input data, PCA discovers the orthogonal directions of the highest variance.

Conclusion

The appropriateness of PCA and autoencoders relies on the particular problem and dataset at hand; each has advantages and disadvantages.

PCA is a linear method that counts on the existence of a linear structure in the data. Non-linear relationships might not be adequately captured. Being neural network-based models, autoencoders have the capacity to produce higher-dimensional representations and can capture intricate non-linear patterns.

Compared to PCA, autoencoders demand higher computational power and training time.

The main components are clearly interpreted by PCA, as opposed to autoencoders, which concentrate on developing suitable representations without any obvious interpretability.

In conclusion, whereas Autoencoders offer greater flexibility and capacity in capturing non-linear correlations, PCA is a well-established technique for linear dimensionality reduction and feature extraction. Depending on the type of data, required level of interpretability, and specific objectives of the analysis or application, one may choose between PCA and Autoencoders.

References

[1] https://www.cs.umd.edu/class/spring2019/cmsc422-0101/materials/lecture20-sp19.pdf

[2] https://www.cs.toronto.edu/~urtasun/courses/CSC411_Fall16/14_pca.pdf

Difference Between PCA and Autoencoders with an example

PCA(Principal Component Analysis)

Introduction

Autoencoders

Relationship Between PCA and Autoencoders

PCA Vs Autoencoders

Implementation

Conclusion

Yagna Dakshina

You may like these posts

Post a Comment

Get new posts by email: